charcoal
charcoal copied to clipboard
`gather_at_rank` handling ties in taxonomic assignment
#171 updated charcoal to sourmash>=4.1.0, including switching from sourmash search to sourmash prefetch.
The taxonomy output for one contig in test file LoombaR_2017__SID1050_bax__bin.11.fa.gz changed. As recorded in that issue:
jq . < tests/test-data/loomba/LoombaR_2017__SID1050_bax__bin.11.fa.gz.contigs-tax.json > out.old
jq . < tests/test-data/loomba/LoombaR_2017__SID1050_bax__bin.11.fa.gz.contigs-tax.json > out.new
diff out.old out.new
2629c2629
< "f__Acutalibacteraceae"
---
> "f__Oscillospiraceae"
2633c2633
< "g__Anaeromassilibacillus"
---
> "g__Flavonifractor"
@ctb surmised:
This is likely because gather doesn't report ties, per dib-lab/sourmash#1366 and dib-lab/sourmash#278. It is slightly surprising in this case that the tie here is above the family level (!!) but these things happen.
It's probably a good idea for gather_at_rank to detect and handle/report such ties, and probably pull the taxonomic assignment up to the level above the tie.
@bluegenes