Parsing: fix Stockholm for Clustal Omega output
Clustal Omega writes files like this:
# STOCKHOLM 1.0
#=GS Q8BX79|reviewed|Probable DE G-protein coupled receptor 21|taxID:10090
...
Q8BX79|reviewed|Probable -----MNSTWDGN---------QSSHPFCLLAL-------GYLETVRFCL
...
That "|reviewed|Probable" ended up getting slurped into the sequence. This commit searches for the final space in the sequence line and only considers everything after that to be the sequence.
CC @tmcgrath325
Hi @timholy!
Thanks for pointing this out! I think that the real issue is that I tried to use the pipe in the regex of the following line to mean spaces OR tabs, but since it is inside the squared brackets, it is considered as a literal pipe causing the bug:
https://github.com/diegozea/MIToS.jl/blob/717d6f9306ca743ed66f0f94c023fcbeb47c00d4/src/Utils/GeneralUtils.jl#L39
So, the more straightforward solution should be to delete the pipe in that regex.
Cheers
Good suggestion, thanks!
Thanks again for finding and solving this parser's bugging behavior.