kevlar icon indicating copy to clipboard operation
kevlar copied to clipboard

Compose control counttables for `kevlar novel` step

Open standage opened this issue 6 years ago • 2 comments

After counting k-mers for each control sample, we should investigate composing the counttables into a single nodetable before running kevlar novel. This should a couple of synergistic benefits.

  • We have a single table instead of 2 (or more), reducing time due to k-mer abundance queries
  • A nodetable consumes 1/8 of the size of a counttable with the same number of buckets

The cost is, of course, another pass over the "data". But it should be possible to build a nodetable directly from the underlying counttables themselves without iterating over the reads again. So "data" should be quite small and manageable.

standage avatar Jun 18 '18 20:06 standage