cdhit icon indicating copy to clipboard operation
cdhit copied to clipboard

When sequence contains at least one gap '-' I get: Discarding invalid sequence or sequence without identifier and description!

Open jasiozaucha opened this issue 4 years ago • 5 comments

Hi everyone, If my sequence contains even a single gap '-' I always get the following warning:

Discarding invalid sequence or sequence without identifier and description!

I did not set any of the options that limit the gap size or enforce a minumum alignment length (even so, a single gap should not cause issues right?). Can anyone advise me how to encode my sequences so that sequences containing gaps are not removed?

Thanks!

jasiozaucha avatar Nov 07 '19 15:11 jasiozaucha

I am having the same problem. Any advice?

DavidBSauer avatar Nov 19 '19 17:11 DavidBSauer

Hey David,

I don't know what the authors think of this but I simply removed the gaps. Just feed the methods with real sequences rather than the raw alignments.

jasiozaucha avatar Nov 19 '19 18:11 jasiozaucha

Hi Jasiozaucha,

Thank you for your advice. I had same problems.

changhan1110 avatar Jun 11 '20 09:06 changhan1110

Thanks for this issue. I was running into the same problem and didn't realize that it was due to gap characters. I'll remove the gaps and try again.

Having a more informative error message would be beneficial here!

davised avatar Oct 24 '20 06:10 davised

Is it possible to run with gap tokens? Since I want to cluster gapped sequences, which have been pre-aligned (this is not a typical alignment process and can't be done internally).

If not, does anyone know of an alternative clustering tool which can handle gaps?

OWissett avatar Apr 10 '24 12:04 OWissett