foldcomp
foldcomp copied to clipboard
ESMFold database header issues
When I extract FASTA
from highquality_clust30
I receive the following headers.
>ESMFOLD V0 PREDICTION FOR MGYP000138429313
>ESMFOLD V0 PREDICTION FOR MGYP001595280761
...
I use FoldComp
for a downstream application, and per FASTA specification in this case each sequence will have a header ESMFOLD
, which is not unique. The unique id
is stored in the comment.
I can run sed
on it, but this solution feels hacky.
The highquality_clust30.lookup
looks appropriate:
0 MGYP002174220927 0
1 MGYP000064029927 0
Do you have recommendations on how to get proper FASTA headers?
Cheers V
Sorry for the late response. I've changed the default to use id/filename when extracting sequences in 412c7a8 and introduced use-title
flag if title is needed.