kevlar icon indicating copy to clipboard operation
kevlar copied to clipboard

Re-evaluate impact of error correction

Open standage opened this issue 6 years ago • 0 comments

Performing error correction drastically reduces the sequence content (specifically the number of distinct k-mers) in each data set, and accordingly the amount of memory required to track k-mer counts accurately. At one point we were pretty enthusiastic about this improvement, but abandoned it at one point since it led to some false negatives.

I think this decision was based on a small number of manually inspected variants (perhaps even 1), and not on overall statistics. And in any case all of the variants involved were SNVs, where our superiority is already marginal. We should re-investigate kevlar's performance on the latest simulations using error corrected data.

standage avatar Jun 18 '18 20:06 standage