[BUG] GroTop parser can't handle negative residue numbers
All residue numbers are interpreted as positive integers, leading to duplicate matches when deduplicating, i.e. -1 gets converted to 1, so there would be two residue indices with number 1 if there was a positive 1 elsewhere. This is easily fixed by updating the regex.
I've added a fix to handle negative residue numbers in the parser here. This appears to work and I see the correct numbers on load, e.g.:
In [1]: import sire as sr
In [2]: mols = sr.stream.load("good_resid.bss")
In [3]: sr.save(mols, "test", ["grotop", "gro87"])
Out[3]:
['/home/lester/Code/openbiosim/sire/test.grotop',
'/home/lester/Code/openbiosim/sire/test.gro87']
In [4]: new_mols = sr.load("test.gro*")
In [5]: new_mols.residues()
Out[5]:
SireMol::SelectorM<SireMol::Residue>( size=3
0: MolNum(3) Residue( XXX:-1 num_atoms=4 )
1: MolNum(3) Residue( XXX:0 num_atoms=7 )
2: MolNum(3) Residue( XXX:1 num_atoms=9 )
)
However, in Python-land it appears that it is not possible to create a negative ResNum, e.g:
In [1]: from sire.legacy.Mol import ResNum
In [2]: ResNum(-1)
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
Cell In[2], line 1
----> 1 ResNum(-1)
OverflowError: can't convert negative value to unsigned int
As such, it looks like it would be dangerous to have a negative residue number floating around, e.g. if we wanted to set one residue number to the value of another, etc. (I came across this issue when trying to create a minimal test file.)
@chryswoods: Any suggestions? I've certainly not come used negative residue numbers before so assume that they aren't generally supported by the other parsers too. Since they do want to preserve the residue number I can only suggest that they do this with a mapping, i.e. store a mapping between the index and residue number in their original system, then use that to refer to the residues going forward.
It is possible to construct a new ResNum using an existing one with value -1, e.g. res_num = ResNum(negative_res_num), but explicitly passing a negative integer doesn't work:
C++ signature:
__init__(_object*, SireMol::ResNum other)
__init__(_object*, unsigned int num)
__init__(_object*)
I don't think there is a reason to prevent having a negative ResNum. Looking at the code, ResNum inherits from SireID::Number, which holds signed qint32. The only thing blocking a negative number is that the constructor for ResNum is
explicit ResNum(quint32 num);
which constrains to a positive, unsigned number.
I think changing this to
explicit ResNum(qint32 num);
and updating the wrappers would do it. And I don't think it would break other code.
The only thing I would worry about is whether negative numbers are disallowed for various parsers - thinking of PDB or AmberPrm here... We will find out via trial and error ;-)
Thanks, I'll give that a go.
The only thing I would worry about is whether negative numbers are disallowed for various parsers - thinking of PDB or AmberPrm here... We will find out via trial and error ;-)
Yes, that's what I was worried about too. We do have the internal function (used in places by BioSimSpace) that can renumber everything in ascending order, so I guess that could always be used in problem cases. It's just tricky if the number is important, i.e. the user wants to preserve it, but not supported by a particular format. In that case, I think the suggested mapping would be the way forward.
This works. I'll test further to see if it causes issues with other parsers, etc.