sourmash
sourmash copied to clipboard
`sourmash tax genome` may fail silently with `--force` and incorrect gather inputs
sourmash tax genome --force
fails silently or yields confusing error if you pass in multiple gather results for the same query
I accidentally passed in both k7 and k10 gather results for a set of queries. This nicely fails without --force
, saying more than one gather file was found for a particular query. With --force
, we read both files in and then aggregate gather results across them. You then MIGHT get an error that the summarized percentage was > 100% (more than 100% of the query was matched), which should never happen. If the percentages were less than 100%, this would have failed silently and given incorrect results.
I think maybe we should never allow seeing a query in multiple gather files (disallow this force behavior).
- Now that
ksize
,moltype
, andscaled
are parameters in the gather csv, we should also check these and only allow summarization over the same params, for SAFETY! This would also have fixed the above issue.
Note that we need --force
to continue past empty gather csvs, so fixing this is important (nudges self)
--force
now works properly for empty taxonomies, too - fixed in #2218.