taxtriage
taxtriage copied to clipboard
Top Hits - Improper strain-level interpretations
Description of the bug
-
E Coli really struggles in kraken2, and ultimately, it is a mess of 100s of strains being classified. Increasing how conservative the LCA mapping is is one solution OR we can compare, across an entire species like E.Coli, if all the summations of children (s1,s2,etc) are LESS than a given species, then omit them from top hits and only do species
-
Flu labeling is full of "reference genomes" in the metadata at the species level and false positives from k2 output leads to S1 or below being missed in the top hits passing. With that, we need to work in the opposite direction, where for a given top 10 species we travel down the tree to the lowest "child" within a species and use that IF the clade assigned is higher for that than at the species level
Essentially, false positives are causing a mess in what references we pull for alignment, and filtering out a lot of true top hits for heavily populated assemblies like flu or e. coli (edited)
Command used and terminal output
No response
Relevant files
No response
System information
No response