sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

zipfile databases could include taxonomy information

Open bluegenes opened this issue 1 year ago • 7 comments

I think I mentioned this as part of another issue, but I think it would make taxonomy analyses a bit simpler if we included our taxonomy csv (optionally formatted as an sqldb) within the zipfile, similar to how we currently include the manifest.

obviously, this only works for pre-prepared databases, but we could make a database method to index a user-generated taxonomy csv and add it to a sourmash database.

bluegenes avatar Jul 26 '22 16:07 bluegenes

yep! also see https://github.com/sourmash-bio/sourmash/issues/2012

ctb avatar Jul 26 '22 17:07 ctb

Whoops- shall I close this duplicate?

bluegenes avatar Jul 26 '22 18:07 bluegenes

nah, we can consolidate later

ctb avatar Jul 26 '22 18:07 ctb

When we do this, it would be really neat to allow signature selection on the taxonomy, e.g. --include s__Phaeobacter. Yes, we can currently do this with sig grep or picklists using the taxonomy file, but seamless integration would be great.

bluegenes avatar Jul 30 '22 16:07 bluegenes

side note, I'm 99% sure that SqliteIndex supports storing both signatures and taxonomies already.

ctb avatar Aug 05 '22 22:08 ctb

at least,

rm xyz.sqldb
sourmash sig cat -k 31 podar-ref/1.fa.sig -o xyz.sqldb
sourmash tax prepare -t podar-ref/podar-lineage.csv -o xyz.sqldb -F sql
sourmash gather podar-ref/1.fa.sig xyz.sqldb -o xyz.csv
sourmash tax genome -t xyz.sqldb -g xyz.csv

works without complaint... so yeah, seems to work :)

ctb avatar Aug 05 '22 22:08 ctb

... we included our taxonomy csv (optionally formatted as an sqldb)

note, SQLite does not support loading a database from within a zipfile :(

ctb avatar Aug 10 '22 16:08 ctb