sire icon indicating copy to clipboard operation
sire copied to clipboard

[BUG] GroTop parser can't handle negative residue numbers

Open lohedges opened this issue 1 year ago • 5 comments

All residue numbers are interpreted as positive integers, leading to duplicate matches when deduplicating, i.e. -1 gets converted to 1, so there would be two residue indices with number 1 if there was a positive 1 elsewhere. This is easily fixed by updating the regex.

lohedges avatar Feb 25 '25 11:02 lohedges

I've added a fix to handle negative residue numbers in the parser here. This appears to work and I see the correct numbers on load, e.g.:

In [1]: import sire as sr

In [2]: mols = sr.stream.load("good_resid.bss")

In [3]: sr.save(mols, "test", ["grotop", "gro87"])
Out[3]:
['/home/lester/Code/openbiosim/sire/test.grotop',
 '/home/lester/Code/openbiosim/sire/test.gro87']

In [4]: new_mols = sr.load("test.gro*")

In [5]: new_mols.residues()
Out[5]:
SireMol::SelectorM<SireMol::Residue>( size=3
0: MolNum(3) Residue( XXX:-1  num_atoms=4 )
1: MolNum(3) Residue( XXX:0   num_atoms=7 )
2: MolNum(3) Residue( XXX:1   num_atoms=9 )
)

However, in Python-land it appears that it is not possible to create a negative ResNum, e.g:

In [1]: from sire.legacy.Mol import ResNum

In [2]: ResNum(-1)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
Cell In[2], line 1
----> 1 ResNum(-1)

OverflowError: can't convert negative value to unsigned int

As such, it looks like it would be dangerous to have a negative residue number floating around, e.g. if we wanted to set one residue number to the value of another, etc. (I came across this issue when trying to create a minimal test file.)

@chryswoods: Any suggestions? I've certainly not come used negative residue numbers before so assume that they aren't generally supported by the other parsers too. Since they do want to preserve the residue number I can only suggest that they do this with a mapping, i.e. store a mapping between the index and residue number in their original system, then use that to refer to the residues going forward.

lohedges avatar Feb 25 '25 11:02 lohedges

It is possible to construct a new ResNum using an existing one with value -1, e.g. res_num = ResNum(negative_res_num), but explicitly passing a negative integer doesn't work:

 C++ signature:
    __init__(_object*, SireMol::ResNum other)
    __init__(_object*, unsigned int num)
    __init__(_object*)

lohedges avatar Feb 25 '25 12:02 lohedges

I don't think there is a reason to prevent having a negative ResNum. Looking at the code, ResNum inherits from SireID::Number, which holds signed qint32. The only thing blocking a negative number is that the constructor for ResNum is

explicit ResNum(quint32 num);

which constrains to a positive, unsigned number.

I think changing this to

explicit ResNum(qint32 num);

and updating the wrappers would do it. And I don't think it would break other code.

The only thing I would worry about is whether negative numbers are disallowed for various parsers - thinking of PDB or AmberPrm here... We will find out via trial and error ;-)

chryswoods avatar Feb 28 '25 08:02 chryswoods

Thanks, I'll give that a go.

The only thing I would worry about is whether negative numbers are disallowed for various parsers - thinking of PDB or AmberPrm here... We will find out via trial and error ;-)

Yes, that's what I was worried about too. We do have the internal function (used in places by BioSimSpace) that can renumber everything in ascending order, so I guess that could always be used in problem cases. It's just tricky if the number is important, i.e. the user wants to preserve it, but not supported by a particular format. In that case, I think the suggested mapping would be the way forward.

lohedges avatar Feb 28 '25 09:02 lohedges

This works. I'll test further to see if it causes issues with other parsers, etc.

lohedges avatar Mar 03 '25 16:03 lohedges