cobs icon indicating copy to clipboard operation
cobs copied to clipboard

amino acid support

Open jianshu93 opened this issue 2 years ago • 3 comments

Hello COBS team,

Any possibilities to also support amino acid sequences (20 alphabets instead of 4 for DNA). I found this to be extremely useful for genome query/search.

Thanks,

Jianshu

jianshu93 avatar Jun 22 '22 18:06 jianshu93

Should already support amino acid sequences in the sense that it indexes free text. Pass in a .txt file of amino acids and see how you go?

iqbal-lab avatar Jun 22 '22 20:06 iqbal-lab

Thanks for quick response. I saw that fasta file in test folder is amino acid sequence. for example sample6.fasta. I will try now.

Thanks

Jianshu

jianshu93 avatar Jun 22 '22 20:06 jianshu93

Note that by default it tries to canonicalize kmers (taking kmer and reverse complement and choosing smaller). See this text from the README:

With the flag --no-canonicalize any letters or text can be indexed.

iqbal-lab avatar Jun 22 '22 20:06 iqbal-lab