KMC icon indicating copy to clipboard operation
KMC copied to clipboard

enable amino acid kmer matching through translation to canonicalized codons

Open notestaff opened this issue 4 years ago • 3 comments

Add support for matching amino acid kmers. An amino acid kmer can be represented as a nucleotide kmer where each amino acid gets mapped to a canonical (e.g. lexicographically smallest) codon. An amino acid FASTA file can then be mapped on-the-fly to a nucleotide file from which kmers can be gathered as normal. tblastn/tblastx-like matching can also be enabled, by adding options to do three- or six-frame translations of each input nucleotide sequence, then representing the resulting amino acid sequences as nucleotide sequence with canonical codons as above, before extracting kmers; this would again be done on-the-fly. So only the only change is to code that extracts kmers from FASTAs. @marekkokot

notestaff avatar Jan 08 '20 19:01 notestaff