Welcome to wort! It's a database for sourmash signatures, currently focused on
indexing datasets from the
NCBI Sequence Read Archive,
the JGI Integrated Microbial Genomes and Microbiomes portal,
and NCBI Assembly resources
(GenBank and
RefSeq).
There are currently 7,947,644 datasets,
calculated from 2,833.14 TB of original data.
Here are some example pages for these datasets:
- SRA: SRR15461028
- IMG: 2728369338
- NCBI Assemblies: GCA_016787465.1
For more info check the poster presented at Biological Data Science 2018.
Also check the GitHub repository, where we keep track of code, issues and feature requests.
If you want to check resources used by wort, there is a public dashboard showing storage, queue sizes and upload rates to the S3 storage.