Mash icon indicating copy to clipboard operation
Mash copied to clipboard

question about custom alphabets

Open emzodls opened this issue 6 years ago • 1 comments

Hi, great program. I had a question about using the custom alphabet. What input does it accept? Does it still assume that it is a fasta like file? I'm interested in using mash for sequences of concatenated domains, I figured I can use domain to letter bindings and just map back afterwards but this would only work if I had 52 domains or less (if I make the characters case-sensitive). Is there a way to get around this if I have more than 52 different types of domains? Thanks.

emzodls avatar Feb 28 '18 15:02 emzodls

Yes, it still assumes fasta (or fastq) input. All it really does is filter out k-mers with ASCII characters that are not in the alphabet string you give it, so that could be more than 52 potentially. I would guess anything above ASCII 33 is safe for parsing but I haven't actually tried anything non-alphanumeric.

ondovb avatar Apr 05 '18 19:04 ondovb