KMC
KMC copied to clipboard
add flat list of kmers as input format
Add option to create a kmer database from a simple text file listing the kmers and their counts, like what kmc_tools transform dump outputs. If counts are omitted, set them all to 1.
Use case: given known sequences from a taxon, I want to create kmers not just from these sequences, but from sequences I might expect to see based on conservation information. So, if a multiple alignment of virus sequences shows that at a given genome position the base might be T, C or G (i.e. not conserved), I might expect novel strains of this virus to have an A there, so would want to add kmers with A there. This can increase sensitivity for detecting new/unknown strains.