foldcomp icon indicating copy to clipboard operation
foldcomp copied to clipboard

ESMFold database header issues

Open valentynbez opened this issue 3 months ago • 0 comments

When I extract FASTA from highquality_clust30 I receive the following headers.

>ESMFOLD V0 PREDICTION FOR MGYP000138429313
>ESMFOLD V0 PREDICTION FOR MGYP001595280761
...

I use FoldComp for a downstream application, and per FASTA specification in this case each sequence will have a header ESMFOLD, which is not unique. The unique id is stored in the comment. I can run sed on it, but this solution feels hacky. The highquality_clust30.lookup looks appropriate:

0       MGYP002174220927        0
1       MGYP000064029927        0

Do you have recommendations on how to get proper FASTA headers?

Cheers V

valentynbez avatar Mar 25 '24 16:03 valentynbez