C. Titus Brown
C. Titus Brown
https://sourmash.readthedocs.io/en/latest/using-LCA-database-API.html - change title to include "in Python" or something - maybe deprecate altogether
This has aged and needs updates - https://sourmash.readthedocs.io/en/latest/sourmash-collections.html - use RocksDB - use sourmash tax
should work better than jaccard below a threshold yes?
...since `sourmash gather` outputs `name` while `fastgather` and `fastmultigather` output `match_name`. Note that `::prefetch` works fine with `fastgather` and `fastmultigather` outputs, so 🤷
ref: * https://github.com/sourmash-bio/sourmash_plugin_branchwater/issues/501 * discussion in https://github.com/sourmash-bio/sourmash_plugin_branchwater/pull/445 * and also benchmarks in https://github.com/sourmash-bio/sourmash_plugin_branchwater/issues/525
The following code: ```python pylab.plot(gather_df.gather_result_rank, gather_df.query_containment_ani, label='query ani') pylab.plot(gather_df.gather_result_rank, gather_df.match_containment_ani, label='match ani') pylab.plot(gather_df.gather_result_rank, gather_df.average_containment_ani, label='average ani') pylab.title('SRR606249 (podar) gather results') pylab.xlabel('gather result rank') pylab.ylabel('ANI') pylab.legend(loc='lower left') pylab.savefig('/tmp/gather-ani.png') ``` shows: ...
`sourmash tax annotate` nicely annotates gather results with a lineage column, producing a with-lineages CSV. Maybe `sourmash tax metagenome` (and `tax genome`?) could consume those natively? Right now they don't...
e.g. grepq uses seq_io. https://www.biorxiv.org/content/10.1101/2025.01.09.632104v1.full
https://github.com/sourmash-bio/sourmash_plugin_betterplot/issues/67 It's tricky, but do-able :)
our local HPC is down for maintenance, and that's where we store the databases. apologies!