Contents

Tolkein

Travis-CI Build Status Coverage Status Documentation Status Code Style Black PyPI Package latest release Install with Conda Commits since latest release MIT License

Tree of Life Kit of Evolutionary Informatics Novelties

Installation

conda install -c tolkit tolkein

or

pip install tolkein

You can also install the in-development version with:

pip install https://github.com/tolkit/tolkein/archive/main.zip

Development

To run all tests run:

tox

Installation

At the command line:

pip install tolkein

Usage

To use Tolkein in a project:

import tolkein

Reference

tobin

Binning functions.

tolkein.lib.tobin.readable_bin(value, *, start_digits=None)[source]

Place values in human readable bins.

Parameters:
  • value (float) – Number to bin.
  • start_digits (list, optional) – List of threshold values between 1 and 10. Defaults to [1, 2, 3, 5]
>>> readable_bin(123456789)
'200M'

>>> readable_bin(56789012234)
'100G'

>>> readable_bin(567.89)
'1k'

>>> readable_bin(21, start_digits=[1, 2, 4, 8])
'40'

>>> readable_bin(-1234567)
'-2M'

tofetch

Fetch methods.

class tolkein.lib.tofetch.TqdmUpTo(iterable=None, desc=None, total=None, leave=True, file=None, ncols=None, mininterval=0.1, maxinterval=10.0, miniters=None, ascii=None, disable=False, unit='it', unit_scale=False, dynamic_ncols=False, smoothing=0.3, bar_format=None, initial=0, position=None, postfix=None, unit_divisor=1000, write_bytes=None, lock_args=None, nrows=None, colour=None, delay=0, gui=False, **kwargs)[source]

Provides update_to(n) which uses tqdm.update(delta_n).

From tqdm documentation.

update_to(b=1, bsize=1, tsize=None)[source]

Tqdm update_to method.

Parameters:
  • b (int, optional) – Number of blocks transferred so far [default: 1].
  • bsize (int, optional) – Size of each block (in tqdm units) [default: 1].
  • tsize (int, optional) – Total size (in tqdm units).
tolkein.lib.tofetch.extract_tar(filename, path)[source]

Extract tarred archive.

Parameters:
  • filename (str) – Name of tar file to extract.
  • path (str) – Path to extract tar file.
Returns:

Count of file members extracted from tar archive.

Return type:

int

tolkein.lib.tofetch.fetch_file(url, path, decode=True)[source]

Fetch a remote file.

Parameters:
  • url (str) – Remote URL to fetch.
  • path (str) – Path to extract tar file.
  • decode (bool, optional) – Determines whether to unzip content. Defaults to True.
tolkein.lib.tofetch.fetch_ftp(url, filename)[source]

Fetch a file via ftp.

tolkein.lib.tofetch.fetch_stream(url, *, decode=True, show_progress=True)[source]

Stream download.

Parameters:
  • decode (bool, optional) – Determines whether to unzip content. Defaults to True.
  • show_progress (bool, optional) – Show a progress bar to indicate file streaming progress. Defaults to True.
Yields:

str – 1024 byte chunk of remote URL.

tolkein.lib.tofetch.fetch_tar(url, path)[source]

Fetch and extract tarred archives.

Parameters:
  • url (str) – Remote URL to fetch.
  • path (str) – Path to extract tar file.
Returns:

Count of file members extracted from tar archive.

Return type:

int

tolkein.lib.tofetch.fetch_tmp_file(url)[source]

Fetch a remote URL to a temporary file.

Parameters:url (str) – Remote URL to fetch.
Returns:Temporary filename.
Return type:str
tolkein.lib.tofetch.fetch_url(url)[source]

Fetch a URL.

Parameters:url (str) – Remote URL to fetch.
Returns:Content of file as a string. Will return None if response is not OK.
Return type:str

tofile

Read, write and parse files.

tolkein.lib.tofile.delete_file(filename)[source]

Delete a file if exists.

Parameters:filename (str) – Name of file to delete.
tolkein.lib.tofile.load_yaml(filename)[source]

Parse a JSON/YAML file.

Parameters:filename (str) – Name of JSON/YAML file to parse.
Returns:Dict or list of file content.
tolkein.lib.tofile.open_file_handle(filename)[source]

Open a filehandle.

Automatically detect gzipped files based on suffix.

Parameters:filename (str) – Name of file to read.
Returns:An open filehandle. Will return None if the file cannot be opened.
tolkein.lib.tofile.read_file(filename)[source]

Read a whole file into memory.

Automatically detect gzipped files based on suffix.

Parameters:filename (str) – Name of file to read.
Returns:Content of file as a string. Will return None if file cannot be read.
Return type:str
tolkein.lib.tofile.stream_fasta(filename)[source]

Stream a FASTA file, sequence by sequence.

Automatically detect gzipped files based on suffix.

Parameters:

filename (str) – Name of FASTA file to read.

Yields:

A tuple of:

(
    str: Sequence ID,
    str: Sequence string
)
tolkein.lib.tofile.write_file(filename, data, *, plain=False)[source]

Write a file, use suffix to determine type and compression.

  • types: ‘.json’, ‘.yaml’
  • compression: None, ‘.gz’
Parameters:
  • filename (str) – Name of FASTA file to read.
  • data – data to write to file.
  • plain (bool, optional) – Whether to treat data as plain text. Defaults to False.
Returns:

Whether file was written successfully.

Return type:

bool

toinsdc

INSDC methods.

tolkein.lib.toinsdc.count_taxon_assembly_meta(root)[source]

Count INSDC assemblies descended from root taxon.

Parameters:root (int) – Root taxon taxid.
Returns:Count of assemblies for taxa descended from root. Will return None on error.
Return type:int
tolkein.lib.toinsdc.fetch_wgs_assembly_meta(root, *, count=-1, offset=0, page=10000)[source]

Query INSDC WGS assemblies descended from root taxon.

Parameters:
  • root (int) – Root taxon taxid.
  • count (int) – Number of assemblies to return. Default value (-1) returns all assemblies.
  • offset (int) – Offset of first assembly to return. Defaults to 0.
  • page (int) – Number of assemblies to fetch per API request. Defaults to 10000.
Yields:

dict – A dict of INSDC WGS assembly metadata keyed on sample accession.

tolkein.lib.toinsdc.stream_taxon_assembly_meta(root, *, count=-1, offset=0, page=10000)[source]

Query INSDC assemblies descended from root taxon.

Parameters:
  • root (int) – Root taxon taxid.
  • count (int) – Number of assemblies to return. Default value (-1) returns all assemblies.
  • offset (int) – Offset of first assembly to return. Defaults to 0.
  • page (int) – Number of assemblies to fetch per API request. Defaults to 10000.
Yields:

dict – Normalised dict of INSDC metadata.

tolog

Log events.

class tolkein.lib.tolog.DisableLogger[source]

Logger context management.

>>> my_logger.info('Print log messages')
>>> with DisableLogger():
...     my_logger.info('Disable log messages')
>>> my_logger.info('Print log messages again')
__enter__()[source]

Set logging level to critical.

__exit__(x, y, z)[source]

Set logging level back to default.

tolkein.lib.tolog.logger(name='tolkein')[source]

Create logger.

Parameters:name (str, optional) – Logger name. Defaults to “tolkein”.
Returns:A logger instance.
Return type:logging.Logger

totax

Taxonomy methods.

tolkein.lib.totax.add_xrefs(names, xrefs)[source]

Add xrefs to a list of taxon names.

tolkein.lib.totax.parse_ncbi_names_dmp(path, nodes)[source]

Parse names.dmp file and add to nodes dict.

tolkein.lib.totax.parse_ncbi_nodes_dmp(path)[source]

Parse NCBI format nodes.dmp file.

tolkein.lib.totax.parse_ncbi_taxdump(path, root=None)[source]

Expand lineages from nodes dict.

tolkein.lib.totax.parse_ott_names_dmp(path, nodes)[source]

Parse synonyms.tsv file and add to nodes dict.

tolkein.lib.totax.parse_ott_nodes_dmp(path)[source]

Parse Open tree of Life taxonomy.tsv file.

tolkein.lib.totax.parse_ott_taxdump(path, root=None)[source]

Expand lineages from nodes dict.

tolkein.lib.totax.parse_taxonomy(taxonomy_type, path, root=None)[source]

Parse taxonomy into list of dicts.

tolkein.lib.totax.stream_nodes(nodes, roots)[source]

Add lineage info and stream taxonomy nodes.

Contributing

Bug reports

When reporting a bug please include:

  • Your operating system name and version.
  • Any details about your local setup that might be helpful in troubleshooting.
  • Detailed steps to reproduce the bug.

Documentation improvements

Contributions to the official Tolkein docs and internal docstrings are always welcome.

Feature requests and feedback

The best way to send feedback is to file an issue at https://github.com/tolkit/tolkein/issues.

If you are proposing a feature:

  • Explain in detail how it would work.
  • Keep the scope as narrow as possible, to make it easier to implement.
  • Remember that this is a volunteer-driven project, and that code contributions are welcome :)

Development

To set up tolkein for local development:

  1. Fork tolkein (look for the “Fork” button).

  2. Clone your fork locally:

    git clone git@github.com:USERNAME/tolkein.git
    
  3. Create a branch for local development:

    git checkout -b name-of-your-bugfix-or-feature
    

    Now you can make your changes locally.

  4. When you’re done making changes run all the checks and docs builder with tox one command:

    tox
    
  5. Commit your changes and push your branch to GitHub:

    git add .
    git commit -m "Your detailed description of your changes."
    git push origin name-of-your-bugfix-or-feature
    
  6. Submit a pull request through the GitHub website.

Pull Request Guidelines

If you need some code review or feedback while you’re developing the code just make the pull request.

For merging, you should:

  1. Include passing tests (run tox) [1].
  2. Update documentation when there’s new API, functionality etc.
  3. Add a note to CHANGELOG.rst about the changes.
  4. Add yourself to AUTHORS.rst.
[1]

If you don’t have all the necessary python versions available locally you can rely on Travis - it will run the tests for each change you add in the pull request.

It will be slower though …

Tips

To run a subset of tests:

tox -e envname -- pytest -k test_myfeature

To run all the test environments in parallel:

tox -p

Authors

Changelog

0.2.0

Features: * Added fetch methods * Switched code style to black/flake8

Bug fixes: * Stopped log messages printing twice

0.0.1 (2020-07-02)

  • First release on PyPI.

Indices and tables