kevlar icon indicating copy to clipboard operation
kevlar copied to clipboard

Reference-free variant discovery in large eukaryotic genomes

Results 23 kevlar issues
Sort by recently updated
recently updated
newest added

Here I want to keep track of some ideas we've thought of and, in some cases, discussed to improve kevlar's memory consumption and speed. I'll link to relevant issue threads...

optimization

- [x] pip installable - [ ] bioconda integration - [ ] JOSS submission concurrent with journal submission - [ ] community engagement - [ ] tutorials --> jupyter notebooks...

After counting k-mers for each control sample, we should investigate composing the counttables into a single nodetable before running `kevlar novel`. This should a couple of synergistic benefits. - We...

optimization

Performing error correction drastically reduces the sequence content (specifically the number of distinct k-mers) in each data set, and accordingly the amount of memory required to track k-mer counts accurately....

optimization

Our current handling of reads with ambiguous content is as follows. - For counting, kevlar uses khmer's default bulk loading behavior, which is to ignore all k-mers with ambiguous content....

enhancement
accuracy

Suggestion from @camillescott. We should discuss in detail some time.

optimization

The longest step of the kevlar pipeline by far is finding the novel k-mers, and cache misses from Count-Min Sketch k-mer abundance queries are a big part of this. I...

optimization

From the [tutorial](http://kevlar.readthedocs.io/en/stable/quick-start.html) with this command: ``` kevlar simplex \ --case proband.fq.gz --case-min 6 \ --control mother.fq.gz --control father.fq.gz --ctrl-max 0 \ --novel-memory 5M --novel-fpr 0.6 --threads 4 \ --filter-memory...

At the moment kevlar is painfully slow. We've been philosophizing for a while now whether this was more likely due to the poor cache locality of khmer's Count-Min Sketch implementation...

optimization
accuracy