Contents¶
Tolkein¶
Tree of Life Kit of Evolutionary Informatics Novelties¶
Installation¶
conda install -c tolkit tolkein
or
pip install tolkein
You can also install the in-development version with:
pip install https://github.com/tolkit/tolkein/archive/main.zip
Documentation¶
Reference¶
tobin¶
Binning functions.
-
tolkein.lib.tobin.
readable_bin
(value, *, start_digits=None)[source]¶ Place values in human readable bins.
Parameters: - value (float) – Number to bin.
- start_digits (list, optional) – List of threshold values between 1 and 10. Defaults to [1, 2, 3, 5]
>>> readable_bin(123456789) '200M' >>> readable_bin(56789012234) '100G' >>> readable_bin(567.89) '1k' >>> readable_bin(21, start_digits=[1, 2, 4, 8]) '40' >>> readable_bin(-1234567) '-2M'
tofetch¶
Fetch methods.
-
class
tolkein.lib.tofetch.
TqdmUpTo
(iterable=None, desc=None, total=None, leave=True, file=None, ncols=None, mininterval=0.1, maxinterval=10.0, miniters=None, ascii=None, disable=False, unit='it', unit_scale=False, dynamic_ncols=False, smoothing=0.3, bar_format=None, initial=0, position=None, postfix=None, unit_divisor=1000, write_bytes=None, lock_args=None, nrows=None, colour=None, delay=0, gui=False, **kwargs)[source]¶ Provides update_to(n) which uses tqdm.update(delta_n).
From tqdm documentation.
-
tolkein.lib.tofetch.
extract_tar
(filename, path)[source]¶ Extract tarred archive.
Parameters: - filename (str) – Name of tar file to extract.
- path (str) – Path to extract tar file.
Returns: Count of file members extracted from tar archive.
Return type: int
-
tolkein.lib.tofetch.
fetch_file
(url, path, decode=True)[source]¶ Fetch a remote file.
Parameters: - url (str) – Remote URL to fetch.
- path (str) – Path to extract tar file.
- decode (bool, optional) – Determines whether to unzip content. Defaults to True.
-
tolkein.lib.tofetch.
fetch_stream
(url, *, decode=True, show_progress=True)[source]¶ Stream download.
Parameters: - decode (bool, optional) – Determines whether to unzip content. Defaults to True.
- show_progress (bool, optional) – Show a progress bar to indicate file streaming progress. Defaults to True.
Yields: str – 1024 byte chunk of remote URL.
-
tolkein.lib.tofetch.
fetch_tar
(url, path)[source]¶ Fetch and extract tarred archives.
Parameters: - url (str) – Remote URL to fetch.
- path (str) – Path to extract tar file.
Returns: Count of file members extracted from tar archive.
Return type: int
tofile¶
Read, write and parse files.
-
tolkein.lib.tofile.
delete_file
(filename)[source]¶ Delete a file if exists.
Parameters: filename (str) – Name of file to delete.
-
tolkein.lib.tofile.
load_yaml
(filename)[source]¶ Parse a JSON/YAML file.
Parameters: filename (str) – Name of JSON/YAML file to parse. Returns: Dict or list of file content.
-
tolkein.lib.tofile.
open_file_handle
(filename)[source]¶ Open a filehandle.
Automatically detect gzipped files based on suffix.
Parameters: filename (str) – Name of file to read. Returns: An open filehandle. Will return None if the file cannot be opened.
-
tolkein.lib.tofile.
read_file
(filename)[source]¶ Read a whole file into memory.
Automatically detect gzipped files based on suffix.
Parameters: filename (str) – Name of file to read. Returns: Content of file as a string. Will return None if file cannot be read. Return type: str
-
tolkein.lib.tofile.
stream_fasta
(filename)[source]¶ Stream a FASTA file, sequence by sequence.
Automatically detect gzipped files based on suffix.
Parameters: filename (str) – Name of FASTA file to read.
Yields: A tuple of:
( str: Sequence ID, str: Sequence string )
-
tolkein.lib.tofile.
write_file
(filename, data, *, plain=False)[source]¶ Write a file, use suffix to determine type and compression.
- types: ‘.json’, ‘.yaml’
- compression: None, ‘.gz’
Parameters: - filename (str) – Name of FASTA file to read.
- data – data to write to file.
- plain (bool, optional) – Whether to treat data as plain text. Defaults to False.
Returns: Whether file was written successfully.
Return type: bool
toinsdc¶
INSDC methods.
-
tolkein.lib.toinsdc.
count_taxon_assembly_meta
(root)[source]¶ Count INSDC assemblies descended from root taxon.
Parameters: root (int) – Root taxon taxid. Returns: Count of assemblies for taxa descended from root. Will return None on error. Return type: int
-
tolkein.lib.toinsdc.
fetch_wgs_assembly_meta
(root, *, count=-1, offset=0, page=10000)[source]¶ Query INSDC WGS assemblies descended from root taxon.
Parameters: - root (int) – Root taxon taxid.
- count (int) – Number of assemblies to return. Default value (-1) returns all assemblies.
- offset (int) – Offset of first assembly to return. Defaults to 0.
- page (int) – Number of assemblies to fetch per API request. Defaults to 10000.
Yields: dict – A dict of INSDC WGS assembly metadata keyed on sample accession.
-
tolkein.lib.toinsdc.
stream_taxon_assembly_meta
(root, *, count=-1, offset=0, page=10000)[source]¶ Query INSDC assemblies descended from root taxon.
Parameters: - root (int) – Root taxon taxid.
- count (int) – Number of assemblies to return. Default value (-1) returns all assemblies.
- offset (int) – Offset of first assembly to return. Defaults to 0.
- page (int) – Number of assemblies to fetch per API request. Defaults to 10000.
Yields: dict – Normalised dict of INSDC metadata.
tolog¶
Log events.
totax¶
Taxonomy methods.
-
tolkein.lib.totax.
parse_ncbi_names_dmp
(path, nodes)[source]¶ Parse names.dmp file and add to nodes dict.
-
tolkein.lib.totax.
parse_ott_names_dmp
(path, nodes)[source]¶ Parse synonyms.tsv file and add to nodes dict.
-
tolkein.lib.totax.
parse_taxonomy
(taxonomy_type, path, root=None)[source]¶ Parse taxonomy into list of dicts.
Contributing¶
Bug reports¶
When reporting a bug please include:
- Your operating system name and version.
- Any details about your local setup that might be helpful in troubleshooting.
- Detailed steps to reproduce the bug.
Documentation improvements¶
Contributions to the official Tolkein docs and internal docstrings are always welcome.
Feature requests and feedback¶
The best way to send feedback is to file an issue at https://github.com/tolkit/tolkein/issues.
If you are proposing a feature:
- Explain in detail how it would work.
- Keep the scope as narrow as possible, to make it easier to implement.
- Remember that this is a volunteer-driven project, and that code contributions are welcome :)
Development¶
To set up tolkein for local development:
Fork tolkein (look for the “Fork” button).
Clone your fork locally:
git clone git@github.com:USERNAME/tolkein.git
Create a branch for local development:
git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
When you’re done making changes run all the checks and docs builder with tox one command:
tox
Commit your changes and push your branch to GitHub:
git add . git commit -m "Your detailed description of your changes." git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Pull Request Guidelines¶
If you need some code review or feedback while you’re developing the code just make the pull request.
For merging, you should:
- Include passing tests (run
tox
) [1]. - Update documentation when there’s new API, functionality etc.
- Add a note to
CHANGELOG.rst
about the changes. - Add yourself to
AUTHORS.rst
.
[1] | If you don’t have all the necessary python versions available locally you can rely on Travis - it will run the tests for each change you add in the pull request. It will be slower though … |
Tips¶
To run a subset of tests:
tox -e envname -- pytest -k test_myfeature
To run all the test environments in parallel:
tox -p
Authors¶
- Richard Challis - https://twitter.com/rjchallis