sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

should `_load_databases` indicate how many incompatible signatures were filtered out?

Open ctb opened this issue 3 years ago • 4 comments

After https://github.com/dib-lab/sourmash/pull/1420, we run the risk of silently selecting away large numbers of incompatible signatures. Perhaps we should print this out in the _load_database code?

See for example test_search_traverse_incompatible as something that could say, "one signature was ignored."

ctb avatar Mar 31 '21 17:03 ctb

#1637 is relevant - when do we complain about having empty databases to search? 😄

Also, UX principles for large collections https://github.com/sourmash-bio/sourmash/issues/1350 - and enumerators (or progress bars?) https://github.com/sourmash-bio/sourmash/issues/1082 - are much more straightforward with manifests.

ctb avatar Jun 26 '21 14:06 ctb

See relevant comment on #1082 about how progress bars might not be possible or a good idea - https://github.com/sourmash-bio/sourmash/issues/1082#issuecomment-1065900888.

I'm wondering if the right answer is to track the total number of signatures in a collection (using e.g. manifests) and when doing a search of some kind, provide a generic indicator of what fraction of the collection is actually being searched? This should be straightforward.

ctb avatar Mar 26 '22 15:03 ctb

I really like the idea that with manifests, we just output something like this:

loaded/found a total of X sketches
after sketch selection, Y sketches remaining

ctb avatar Aug 03 '22 11:08 ctb

Updated in https://github.com/sourmash-bio/sourmash/pull/2204 - sourmash_args.load_dbs_and_sigs now displays information like so:

loaded 384 total signatures from 65 locations.
after selecting signatures compatible with search, 128 remain.

This is only for the search, gather, and multigather subcommands presently, although prefetch displays similar output.

compare and the various sig subcommands remain to be tackled.

ctb avatar Aug 15 '22 14:08 ctb