Reference

tobin

Binning functions.

tolkein.lib.tobin.readable_bin(value, *, start_digits=None)[source]

Place values in human readable bins.

Parameters:
  • value (float) – Number to bin.
  • start_digits (list, optional) – List of threshold values between 1 and 10. Defaults to [1, 2, 3, 5]
>>> readable_bin(123456789)
'200M'

>>> readable_bin(56789012234)
'100G'

>>> readable_bin(567.89)
'1k'

>>> readable_bin(21, start_digits=[1, 2, 4, 8])
'40'

>>> readable_bin(-1234567)
'-2M'

tofetch

Fetch methods.

class tolkein.lib.tofetch.TqdmUpTo(iterable=None, desc=None, total=None, leave=True, file=None, ncols=None, mininterval=0.1, maxinterval=10.0, miniters=None, ascii=None, disable=False, unit='it', unit_scale=False, dynamic_ncols=False, smoothing=0.3, bar_format=None, initial=0, position=None, postfix=None, unit_divisor=1000, write_bytes=None, lock_args=None, nrows=None, colour=None, gui=False, **kwargs)[source]

Provides update_to(n) which uses tqdm.update(delta_n).

From tqdm documentation.

update_to(b=1, bsize=1, tsize=None)[source]

Tqdm update_to method.

Parameters:
  • b (int, optional) – Number of blocks transferred so far [default: 1].
  • bsize (int, optional) – Size of each block (in tqdm units) [default: 1].
  • tsize (int, optional) – Total size (in tqdm units).
tolkein.lib.tofetch.extract_tar(filename, path)[source]

Extract tarred archive.

Parameters:
  • filename (str) – Name of tar file to extract.
  • path (str) – Path to extract tar file.
Returns:

Count of file members extracted from tar archive.

Return type:

int

tolkein.lib.tofetch.fetch_file(url, path, decode=True)[source]

Fetch a remote file.

Parameters:
  • url (str) – Remote URL to fetch.
  • path (str) – Path to extract tar file.
  • decode (bool, optional) – Determines whether to unzip content. Defaults to True.
tolkein.lib.tofetch.fetch_stream(url, *, decode=True, show_progress=True)[source]

Stream download.

Parameters:
  • decode (bool, optional) – Determines whether to unzip content. Defaults to True.
  • show_progress (bool, optional) – Show a progress bar to indicate file streaming progress. Defaults to True.
Yields:

str – 1024 byte chunk of remote URL.

tolkein.lib.tofetch.fetch_tar(url, path)[source]

Fetch and extract tarred archives.

Parameters:
  • url (str) – Remote URL to fetch.
  • path (str) – Path to extract tar file.
Returns:

Count of file members extracted from tar archive.

Return type:

int

tolkein.lib.tofetch.fetch_tmp_file(url)[source]

Fetch a remote URL to a temporary file.

Parameters:url (str) – Remote URL to fetch.
Returns:Temporary filename.
Return type:str
tolkein.lib.tofetch.fetch_url(url)[source]

Fetch a URL.

Parameters:url (str) – Remote URL to fetch.
Returns:Content of file as a string. Will return None if response is not OK.
Return type:str

tofile

Read, write and parse files.

tolkein.lib.tofile.delete_file(filename)[source]

Delete a file if exists.

Parameters:filename (str) – Name of file to delete.
tolkein.lib.tofile.load_yaml(filename)[source]

Parse a JSON/YAML file.

Parameters:filename (str) – Name of JSON/YAML file to parse.
Returns:Dict or list of file content.
tolkein.lib.tofile.open_file_handle(filename)[source]

Open a filehandle.

Automatically detect gzipped files based on suffix.

Parameters:filename (str) – Name of file to read.
Returns:An open filehandle. Will return None if the file cannot be opened.
tolkein.lib.tofile.read_file(filename)[source]

Read a whole file into memory.

Automatically detect gzipped files based on suffix.

Parameters:filename (str) – Name of file to read.
Returns:Content of file as a string. Will return None if file cannot be read.
Return type:str
tolkein.lib.tofile.stream_fasta(filename)[source]

Stream a FASTA file, sequence by sequence.

Automatically detect gzipped files based on suffix.

Parameters:

filename (str) – Name of FASTA file to read.

Yields:

A tuple of:

(
    str: Sequence ID,
    str: Sequence string
)
tolkein.lib.tofile.write_file(filename, data, *, plain=False)[source]

Write a file, use suffix to determine type and compression.

  • types: ‘.json’, ‘.yaml’
  • compression: None, ‘.gz’
Parameters:
  • filename (str) – Name of FASTA file to read.
  • data – data to write to file.
  • plain (bool, optional) – Whether to treat data as plain text. Defaults to False.
Returns:

Whether file was written successfully.

Return type:

bool

toinsdc

INSDC methods.

tolkein.lib.toinsdc.count_taxon_assembly_meta(root)[source]

Count INSDC assemblies descended from root taxon.

Parameters:root (int) – Root taxon taxid.
Returns:Count of assemblies for taxa descended from root. Will return None on error.
Return type:int
tolkein.lib.toinsdc.fetch_wgs_assembly_meta(root, *, count=-1, offset=0, page=10000)[source]

Query INSDC WGS assemblies descended from root taxon.

Parameters:
  • root (int) – Root taxon taxid.
  • count (int) – Number of assemblies to return. Default value (-1) returns all assemblies.
  • offset (int) – Offset of first assembly to return. Defaults to 0.
  • page (int) – Number of assemblies to fetch per API request. Defaults to 10000.
Yields:

dict – A dict of INSDC WGS assembly metadata keyed on sample accession.

tolkein.lib.toinsdc.stream_taxon_assembly_meta(root, *, count=-1, offset=0, page=10000)[source]

Query INSDC assemblies descended from root taxon.

Parameters:
  • root (int) – Root taxon taxid.
  • count (int) – Number of assemblies to return. Default value (-1) returns all assemblies.
  • offset (int) – Offset of first assembly to return. Defaults to 0.
  • page (int) – Number of assemblies to fetch per API request. Defaults to 10000.
Yields:

dict – Normalised dict of INSDC metadata.

tolog

Log events.

class tolkein.lib.tolog.DisableLogger[source]

Logger context management.

>>> my_logger.info('Print log messages')
>>> with DisableLogger():
...     my_logger.info('Disable log messages')
>>> my_logger.info('Print log messages again')
__enter__()[source]

Set logging level to critical.

__exit__(x, y, z)[source]

Set logging level back to default.

tolkein.lib.tolog.logger(name='tolkein')[source]

Create logger.

Parameters:name (str, optional) – Logger name. Defaults to “tolkein”.
Returns:A logger instance.
Return type:logging.Logger

totax

Taxonomy methods.

tolkein.lib.totax.parse_ncbi_names_dmp(path, nodes)[source]

Parse names.dmp file and add to nodes dict.

tolkein.lib.totax.parse_ncbi_nodes_dmp(path)[source]

Parse NCBI format nodes.dmp file.

tolkein.lib.totax.parse_ncbi_taxdump(path, root=None)[source]

Expand lineages from nodes dict.

tolkein.lib.totax.parse_taxonomy(taxonomy_type, path, root=None)[source]

Parse taxonomy into list of dicts.