KMC icon indicating copy to clipboard operation
KMC copied to clipboard

add flat list of kmers as input format

Open notestaff opened this issue 6 years ago • 0 comments

Add option to create a kmer database from a simple text file listing the kmers and their counts, like what kmc_tools transform dump outputs. If counts are omitted, set them all to 1.

Use case: given known sequences from a taxon, I want to create kmers not just from these sequences, but from sequences I might expect to see based on conservation information. So, if a multiple alignment of virus sequences shows that at a given genome position the base might be T, C or G (i.e. not conserved), I might expect novel strains of this virus to have an A there, so would want to add kmers with A there. This can increase sensitivity for detecting new/unknown strains.

notestaff avatar Jun 06 '18 18:06 notestaff