taxtriage icon indicating copy to clipboard operation
taxtriage copied to clipboard

Top Hits - Improper strain-level interpretations

Open Merritt-Brian opened this issue 4 months ago • 0 comments

Description of the bug

  1. E Coli really struggles in kraken2, and ultimately, it is a mess of 100s of strains being classified. Increasing how conservative the LCA mapping is is one solution OR we can compare, across an entire species like E.Coli, if all the summations of children (s1,s2,etc) are LESS than a given species, then omit them from top hits and only do species

  2. Flu labeling is full of "reference genomes" in the metadata at the species level and false positives from k2 output leads to S1 or below being missed in the top hits passing. With that, we need to work in the opposite direction, where for a given top 10 species we travel down the tree to the lowest "child" within a species and use that IF the clade assigned is higher for that than at the species level

Essentially, false positives are causing a mess in what references we pull for alignment, and filtering out a lot of true top hits for heavily populated assemblies like flu or e. coli (edited)

Command used and terminal output

No response

Relevant files

No response

System information

No response

Merritt-Brian avatar Oct 04 '24 13:10 Merritt-Brian