sourmash
sourmash copied to clipboard
should `_load_databases` indicate how many incompatible signatures were filtered out?
After https://github.com/dib-lab/sourmash/pull/1420, we run the risk of silently selecting away large numbers of incompatible signatures. Perhaps we should print this out in the _load_database
code?
See for example test_search_traverse_incompatible
as something that could say, "one signature was ignored."
#1637 is relevant - when do we complain about having empty databases to search? 😄
Also, UX principles for large collections https://github.com/sourmash-bio/sourmash/issues/1350 - and enumerators (or progress bars?) https://github.com/sourmash-bio/sourmash/issues/1082 - are much more straightforward with manifests.
See relevant comment on #1082 about how progress bars might not be possible or a good idea - https://github.com/sourmash-bio/sourmash/issues/1082#issuecomment-1065900888.
I'm wondering if the right answer is to track the total number of signatures in a collection (using e.g. manifests) and when doing a search of some kind, provide a generic indicator of what fraction of the collection is actually being searched? This should be straightforward.
I really like the idea that with manifests, we just output something like this:
loaded/found a total of X sketches
after sketch selection, Y sketches remaining
Updated in https://github.com/sourmash-bio/sourmash/pull/2204 - sourmash_args.load_dbs_and_sigs
now displays information like so:
loaded 384 total signatures from 65 locations.
after selecting signatures compatible with search, 128 remain.
This is only for the search
, gather
, and multigather
subcommands presently, although prefetch
displays similar output.
compare
and the various sig
subcommands remain to be tackled.