Reference¶
tobin¶
Binning functions.
-
tolkein.lib.tobin.
readable_bin
(value, *, start_digits=None)[source]¶ Place values in human readable bins.
Parameters: - value (float) – Number to bin.
- start_digits (list, optional) – List of threshold values between 1 and 10. Defaults to [1, 2, 3, 5]
>>> readable_bin(123456789) '200M' >>> readable_bin(56789012234) '100G' >>> readable_bin(567.89) '1k' >>> readable_bin(21, start_digits=[1, 2, 4, 8]) '40' >>> readable_bin(-1234567) '-2M'
tofetch¶
Fetch methods.
-
class
tolkein.lib.tofetch.
TqdmUpTo
(iterable=None, desc=None, total=None, leave=True, file=None, ncols=None, mininterval=0.1, maxinterval=10.0, miniters=None, ascii=None, disable=False, unit='it', unit_scale=False, dynamic_ncols=False, smoothing=0.3, bar_format=None, initial=0, position=None, postfix=None, unit_divisor=1000, write_bytes=None, lock_args=None, nrows=None, colour=None, gui=False, **kwargs)[source]¶ Provides update_to(n) which uses tqdm.update(delta_n).
From tqdm documentation.
-
tolkein.lib.tofetch.
extract_tar
(filename, path)[source]¶ Extract tarred archive.
Parameters: - filename (str) – Name of tar file to extract.
- path (str) – Path to extract tar file.
Returns: Count of file members extracted from tar archive.
Return type: int
-
tolkein.lib.tofetch.
fetch_file
(url, path, decode=True)[source]¶ Fetch a remote file.
Parameters: - url (str) – Remote URL to fetch.
- path (str) – Path to extract tar file.
- decode (bool, optional) – Determines whether to unzip content. Defaults to True.
-
tolkein.lib.tofetch.
fetch_stream
(url, *, decode=True, show_progress=True)[source]¶ Stream download.
Parameters: - decode (bool, optional) – Determines whether to unzip content. Defaults to True.
- show_progress (bool, optional) – Show a progress bar to indicate file streaming progress. Defaults to True.
Yields: str – 1024 byte chunk of remote URL.
-
tolkein.lib.tofetch.
fetch_tar
(url, path)[source]¶ Fetch and extract tarred archives.
Parameters: - url (str) – Remote URL to fetch.
- path (str) – Path to extract tar file.
Returns: Count of file members extracted from tar archive.
Return type: int
tofile¶
Read, write and parse files.
-
tolkein.lib.tofile.
delete_file
(filename)[source]¶ Delete a file if exists.
Parameters: filename (str) – Name of file to delete.
-
tolkein.lib.tofile.
load_yaml
(filename)[source]¶ Parse a JSON/YAML file.
Parameters: filename (str) – Name of JSON/YAML file to parse. Returns: Dict or list of file content.
-
tolkein.lib.tofile.
open_file_handle
(filename)[source]¶ Open a filehandle.
Automatically detect gzipped files based on suffix.
Parameters: filename (str) – Name of file to read. Returns: An open filehandle. Will return None if the file cannot be opened.
-
tolkein.lib.tofile.
read_file
(filename)[source]¶ Read a whole file into memory.
Automatically detect gzipped files based on suffix.
Parameters: filename (str) – Name of file to read. Returns: Content of file as a string. Will return None if file cannot be read. Return type: str
-
tolkein.lib.tofile.
stream_fasta
(filename)[source]¶ Stream a FASTA file, sequence by sequence.
Automatically detect gzipped files based on suffix.
Parameters: filename (str) – Name of FASTA file to read.
Yields: A tuple of:
( str: Sequence ID, str: Sequence string )
-
tolkein.lib.tofile.
write_file
(filename, data, *, plain=False)[source]¶ Write a file, use suffix to determine type and compression.
- types: ‘.json’, ‘.yaml’
- compression: None, ‘.gz’
Parameters: - filename (str) – Name of FASTA file to read.
- data – data to write to file.
- plain (bool, optional) – Whether to treat data as plain text. Defaults to False.
Returns: Whether file was written successfully.
Return type: bool
toinsdc¶
INSDC methods.
-
tolkein.lib.toinsdc.
count_taxon_assembly_meta
(root)[source]¶ Count INSDC assemblies descended from root taxon.
Parameters: root (int) – Root taxon taxid. Returns: Count of assemblies for taxa descended from root. Will return None on error. Return type: int
-
tolkein.lib.toinsdc.
fetch_wgs_assembly_meta
(root, *, count=-1, offset=0, page=10000)[source]¶ Query INSDC WGS assemblies descended from root taxon.
Parameters: - root (int) – Root taxon taxid.
- count (int) – Number of assemblies to return. Default value (-1) returns all assemblies.
- offset (int) – Offset of first assembly to return. Defaults to 0.
- page (int) – Number of assemblies to fetch per API request. Defaults to 10000.
Yields: dict – A dict of INSDC WGS assembly metadata keyed on sample accession.
-
tolkein.lib.toinsdc.
stream_taxon_assembly_meta
(root, *, count=-1, offset=0, page=10000)[source]¶ Query INSDC assemblies descended from root taxon.
Parameters: - root (int) – Root taxon taxid.
- count (int) – Number of assemblies to return. Default value (-1) returns all assemblies.
- offset (int) – Offset of first assembly to return. Defaults to 0.
- page (int) – Number of assemblies to fetch per API request. Defaults to 10000.
Yields: dict – Normalised dict of INSDC metadata.
tolog¶
Log events.