Mash
Mash copied to clipboard
question about custom alphabets
Hi, great program. I had a question about using the custom alphabet. What input does it accept? Does it still assume that it is a fasta like file? I'm interested in using mash for sequences of concatenated domains, I figured I can use domain to letter bindings and just map back afterwards but this would only work if I had 52 domains or less (if I make the characters case-sensitive). Is there a way to get around this if I have more than 52 different types of domains? Thanks.
Yes, it still assumes fasta (or fastq) input. All it really does is filter out k-mers with ASCII characters that are not in the alphabet string you give it, so that could be more than 52 potentially. I would guess anything above ASCII 33 is safe for parsing but I haven't actually tried anything non-alphanumeric.