kraken2 icon indicating copy to clipboard operation
kraken2 copied to clipboard

Question: Matching process

Open tramelliwe opened this issue 1 year ago • 1 comments

Hey Thank you for this powerful tool!

I've got a question relating to the k-mer matching process. From what I understand, this only relies on exact matching, as opposed to popular alignment tools that do accept mismatches. Therefore I'm wondering why, when I start with an input of reads that were unsuccessfully aligned to the human genome, I still have 96% of the reads that are matched to the human genome when using Kraken2? I would expect that, if there was indeed an exact match of my read with the human genome, then the alignment software would have successfully aligned it. For info, my reads are ~200bp-long, come from ONT sequencing, and have been aligned to the human genome using Minimap2. Any help on this would be appreciated!

tramelliwe avatar Jan 18 '24 09:01 tramelliwe

The matches reported by KrakenUniq require only a single 31-bp match. Kraken2 requires a similar match, and just 1 k-mer is enough. However, that's not enough for the default settings of most aligners such as Minimap2 or Bowtie2. Those aligners won't report a match to human if all that matches in a 200bp read is a single 31-mer. For Bowtie2, you can just its settings and report much shorter matches if you want it to.

salzberg avatar Jan 22 '24 16:01 salzberg

Thank you that makes sense!

tramelliwe avatar Jul 09 '24 13:07 tramelliwe