kevlar issues

[Meta] Memory and runtime performance improvements

1

Here I want to keep track of some ideas we've thought of and, in some cases, discussed to improve kevlar's memory consumption and speed. I'll link to relevant issue threads...

standage

optimization

Can kevlar work with 10s-100s of bacterial genomes?

1

@ctb suggests it can.

tseemann

question

Titus' thoughts about "user experience"

2

- [x] pip installable - [ ] bioconda integration - [ ] JOSS submission concurrent with journal submission - [ ] community engagement - [ ] tutorials --> jupyter notebooks...

standage

Compose control counttables for `kevlar novel` step

2

After counting k-mers for each control sample, we should investigate composing the counttables into a single nodetable before running `kevlar novel`. This should a couple of synergistic benefits. - We...

standage

optimization

Re-evaluate impact of error correction

Performing error correction drastically reduces the sequence content (specifically the number of distinct k-mers) in each data set, and accordingly the amount of memory required to track k-mer counts accurately....

standage

optimization

Filtering reads with ambiguous content

Our current handling of reads with ambiguous content is as follows. - For counting, kevlar uses khmer's default bulk loading behavior, which is to ignore all k-mers with ambiguous content....

standage

enhancement

accuracy

Investigate using minimizers to reduce memory while searching for interesting k-mers

Suggestion from @camillescott. We should discuss in detail some time.

standage

optimization

Buffer abundance queries in `kevlar novel`

The longest step of the kevlar pipeline by far is finding the novel k-mers, and cache misses from Count-Min Sketch k-mer abundance queries are a big part of this. I...

standage

optimization

Tutorial: AssertionError: Your sequence must be DNA!

2

From the [tutorial](http://kevlar.readthedocs.io/en/stable/quick-start.html) with this command: ``` kevlar simplex \ --case proband.fq.gz --case-min 6 \ --control mother.fq.gz --control father.fq.gz --ctrl-max 0 \ --novel-memory 5M --novel-fpr 0.6 --threads 4 \ --filter-memory...

johnsolk

Profiling results

2

At the moment kevlar is painfully slow. We've been philosophizing for a while now whether this was more likely due to the poor cache locality of khmer's Count-Min Sketch implementation...

standage

optimization

accuracy

kevlar
kevlar copied to clipboard

Metadata

[Meta] Memory and runtime performance improvements

Can kevlar work with 10s-100s of bacterial genomes?

Titus' thoughts about "user experience"

Compose control counttables for `kevlar novel` step

Re-evaluate impact of error correction

Filtering reads with ambiguous content

Investigate using minimizers to reduce memory while searching for interesting k-mers

Buffer abundance queries in `kevlar novel`

Tutorial: AssertionError: Your sequence must be DNA!

Profiling results

← Metadata

Owner

Metadata

kevlar kevlar copied to clipboard

Metadata

← Metadata

Owner

Metadata

kevlar
kevlar copied to clipboard