kevlar icon indicating copy to clipboard operation
kevlar copied to clipboard

Write k-mers to banded counttables in a single pass

Open standage opened this issue 7 years ago • 1 comments

Suggestion from @drtamermansour: in a single pass, write count tables to N files (one for each band) in a single pass. Then running kevlar find in N bands would not require N passes over the entire data set, just loading the count tables from disk N times.

I just wanted to capture this suggestion, I have some concerns and I'm not sure it would yield much benefit.

  • Populating count tables from Fastq reads is only a small portion of the overall runtime of kevlar find. Loading from count table files rather than the Fastq files again probably won't make a huge difference in overall runtime.
  • We have[1] to do a second pass over the case reads anyway, and this pass is the big time consumer. Any optimizations we wanted to do in the future would probably benefit more from focusing on this rather than the banded loading.

And in any case, this is all optimization: there's still work to do to get reliable results first!


[1] There are ways we could investigate to do this in a streaming fashion, but for now I'm happy with saying we have to do a second pass over the reads. :-)

standage avatar Feb 01 '17 22:02 standage