sourmash
sourmash copied to clipboard
zipfile databases could include taxonomy information
I think I mentioned this as part of another issue, but I think it would make taxonomy analyses a bit simpler if we included our taxonomy csv (optionally formatted as an sqldb) within the zipfile, similar to how we currently include the manifest.
obviously, this only works for pre-prepared databases, but we could make a database method to index a user-generated taxonomy csv and add it to a sourmash database.
yep! also see https://github.com/sourmash-bio/sourmash/issues/2012
Whoops- shall I close this duplicate?
nah, we can consolidate later
When we do this, it would be really neat to allow signature selection on the taxonomy, e.g. --include s__Phaeobacter
. Yes, we can currently do this with sig grep or picklists using the taxonomy file, but seamless integration would be great.
side note, I'm 99% sure that SqliteIndex
supports storing both signatures and taxonomies already.
at least,
rm xyz.sqldb
sourmash sig cat -k 31 podar-ref/1.fa.sig -o xyz.sqldb
sourmash tax prepare -t podar-ref/podar-lineage.csv -o xyz.sqldb -F sql
sourmash gather podar-ref/1.fa.sig xyz.sqldb -o xyz.csv
sourmash tax genome -t xyz.sqldb -g xyz.csv
works without complaint... so yeah, seems to work :)
... we included our taxonomy csv (optionally formatted as an sqldb)
note, SQLite does not support loading a database from within a zipfile :(