MIToS.jl icon indicating copy to clipboard operation
MIToS.jl copied to clipboard

Parsing: fix Stockholm for Clustal Omega output

Open timholy opened this issue 1 year ago • 1 comments

Clustal Omega writes files like this:

  # STOCKHOLM 1.0

  #=GS    Q8BX79|reviewed|Probable DE    G-protein coupled receptor 21|taxID:10090
    ...
  Q8BX79|reviewed|Probable     -----MNSTWDGN---------QSSHPFCLLAL-------GYLETVRFCL
    ...

That "|reviewed|Probable" ended up getting slurped into the sequence. This commit searches for the final space in the sequence line and only considers everything after that to be the sequence.

timholy avatar May 12 '24 12:05 timholy

CC @tmcgrath325

timholy avatar May 12 '24 12:05 timholy

Hi @timholy!

Thanks for pointing this out! I think that the real issue is that I tried to use the pipe in the regex of the following line to mean spaces OR tabs, but since it is inside the squared brackets, it is considered as a literal pipe causing the bug:

https://github.com/diegozea/MIToS.jl/blob/717d6f9306ca743ed66f0f94c023fcbeb47c00d4/src/Utils/GeneralUtils.jl#L39

So, the more straightforward solution should be to delete the pipe in that regex.

Cheers

diegozea avatar Jun 01 '24 07:06 diegozea

Good suggestion, thanks!

timholy avatar Jun 06 '24 12:06 timholy

Thanks again for finding and solving this parser's bugging behavior.

diegozea avatar Jun 09 '24 21:06 diegozea