cobs
cobs copied to clipboard
amino acid support
Hello COBS team,
Any possibilities to also support amino acid sequences (20 alphabets instead of 4 for DNA). I found this to be extremely useful for genome query/search.
Thanks,
Jianshu
Should already support amino acid sequences in the sense that it indexes free text. Pass in a .txt file of amino acids and see how you go?
Thanks for quick response. I saw that fasta file in test folder is amino acid sequence. for example sample6.fasta. I will try now.
Thanks
Jianshu
Note that by default it tries to canonicalize kmers (taking kmer and reverse complement and choosing smaller). See this text from the README:
With the flag --no-canonicalize any letters or text can be indexed.