Ilya Shlyakhter
Ilya Shlyakhter
Re: use case, when assembling genomes we sometimes end up with SNPs, represented using ambiguity codes. It is then useful to be able to get the set of kmers in...
Can just have a limit, for each kmer with ambiguities include up to X concretizations. And simply ignore kmers with more than a given number of Ns.
E.g. for protein-coding regionsI might add ambiguities for 3rd base of codons, or other less-conserved bases, to account for kmers I might expect to see.
@marekkokot this would be especially useful since then multiple threads could be used for large bams. currently, bam has to be first converted to fasta, and standard tools like samtools...
p.s. no need to write out the new bam, just writing the names of the reads that pass the filter would suffice (picard can then filter the bam).
@marekkokot not sure how hard it is to add support for filtering bams directly (or just outputting the names of passing reads to a text file), but this would be...
2. Have samples from an infectious outbreak, figuring out who infected whom with what. De novo assembled contigs from each sample, now want to find contigs which have many kmers...
Implementing the functionality in this issue would also solve issue #2655 . Maybe also, clarify in the issue title that "concurrent run" is only for runs from different working directories...
Running locally with docker to simulate the CI builds.