kevlar
kevlar copied to clipboard
Write k-mers to banded counttables in a single pass
Suggestion from @drtamermansour: in a single pass, write count tables to N files (one for each band) in a single pass. Then running kevlar find
in N bands would not require N passes over the entire data set, just loading the count tables from disk N times.
I just wanted to capture this suggestion, I have some concerns and I'm not sure it would yield much benefit.
- Populating count tables from Fastq reads is only a small portion of the overall runtime of
kevlar find
. Loading from count table files rather than the Fastq files again probably won't make a huge difference in overall runtime. - We have[1] to do a second pass over the case reads anyway, and this pass is the big time consumer. Any optimizations we wanted to do in the future would probably benefit more from focusing on this rather than the banded loading.
And in any case, this is all optimization: there's still work to do to get reliable results first!
[1] There are ways we could investigate to do this in a streaming fashion, but for now I'm happy with saying we have to do a second pass over the reads. :-)