cdhit
cdhit copied to clipboard
When sequence contains at least one gap '-' I get: Discarding invalid sequence or sequence without identifier and description!
Hi everyone, If my sequence contains even a single gap '-' I always get the following warning:
Discarding invalid sequence or sequence without identifier and description!
I did not set any of the options that limit the gap size or enforce a minumum alignment length (even so, a single gap should not cause issues right?). Can anyone advise me how to encode my sequences so that sequences containing gaps are not removed?
Thanks!
I am having the same problem. Any advice?
Hey David,
I don't know what the authors think of this but I simply removed the gaps. Just feed the methods with real sequences rather than the raw alignments.
Hi Jasiozaucha,
Thank you for your advice. I had same problems.
Thanks for this issue. I was running into the same problem and didn't realize that it was due to gap characters. I'll remove the gaps and try again.
Having a more informative error message would be beneficial here!
Is it possible to run with gap tokens? Since I want to cluster gapped sequences, which have been pre-aligned (this is not a typical alignment process and can't be done internally).
If not, does anyone know of an alternative clustering tool which can handle gaps?