rdkit icon indicating copy to clipboard operation
rdkit copied to clipboard

Move some fields and methods from AtomPDBResidueInfo to base class

Open rachelnwalker opened this issue 2 months ago • 7 comments

What does this implement/fix? Explain your changes.

Currently, information about monomers can be set on an atom using the AtomMonomerInfo class which is accessed and set by get/setMonomerInfo. Some tools in RDKit (namely the PDB & sequence/HELM readers and writers) expect the subclass AtomPDBResidueInfo to be used instead as it holds PDB specific information.

To prepare for more adding more monomer support into RDKit, @greglandrum suggested we move some of the fields and members from the subclass to AtomMonomerInfo that are relevant beyond just structures read from PDB. I moved chainId, residueName and residueNumber and expanded the AtomMonomerType enum to include AMINO_ACID and NUCLEIC_ACID. I also added a new field monomerClass, which I think may be useful to store more specific information about how a monomer is classified (such as DNA, RNA, SUGAR, BASE, PHOSPHATE, LINKER, CHEM, LGRP).

Any other comments?

I'm marking this as a draft because I'm not sure how to move forward with supporting new fields in the MolPickler without breaking compatibility with old pickles.

There are a few more things I need to do here:

  • support new fields in AtomMonomerInfo in the molpickler
  • update python wrappers
  • add tests
  • I would also like to bring back https://github.com/rdkit/rdkit/pull/8219 & add some tests

rachelnwalker avatar Oct 11 '25 00:10 rachelnwalker

  • support new fields in AtomMonomerInfo in the molpickler

For this I think you're going to need to bump the pickle version and update the reading code to be able to handle both the new version and the older one. Since we only write one version, no changes to the writer are needed here.

greglandrum avatar Oct 14 '25 16:10 greglandrum

greglandrum avatar Oct 14 '25 16:10 greglandrum

@greglandrum sorry for the delay. I updated the pickling code:

  • it always writes the new tags (BEGIN_ATOM_MONOMER_INFO through END_ATOM_MONOMER_INFO) if MonomerType != PDBRESIDUE, otherwise the writing is the same as before with the added ATOM_PDB_RESIDUE_MONOMERCLASS field)
  • if BEGIN_PDB_RESIDUE is detected, the reader will unpickle using the old method and if BEGIN_ATOM_MONOMER_INFO is detected, it will unpickle using the new method

The test I wrote passes and I think this should be sufficient to preserve old pickles since the behavior only changes if new tags are detected, but I'm not completely sure. I see some regression tests are failing, but I'm not sure if it is because I broke pickling or because I need to update pickle references somewhere because I bumped the version number

rachelnwalker avatar Oct 28 '25 15:10 rachelnwalker

The test I wrote passes and I think this should be sufficient to preserve old pickles since the behavior only changes if new tags are detected, but I'm not completely sure. I see some regression tests are failing, but I'm not sure if it is because I broke pickling or because I need to update pickle references somewhere because I bumped the version number

The cartridge test is failing because it includes the actual pickle output in the test results. This is suboptimal since it causes those tests to fail whenever the pickle version number is changed. We should update those tests to not have this happen, but as far as this PR goes, the solution is to update the expected results file Code/PgSQL/rdkit/expected/xqm.out with the new values. I think you should be able to see them here: https://dev.azure.com/rdkit-builds/RDKit/_build/results?buildId=7748&view=logs&jobId=072c6a9a-9194-51d9-7744-c167ca2015ce&j=072c6a9a-9194-51d9-7744-c167ca2015ce&t=3e9caea3-19a4-558e-d3d8-93be150120e7

greglandrum avatar Oct 28 '25 20:10 greglandrum

The test I wrote passes and I think this should be sufficient to preserve old pickles since the behavior only changes if new tags are detected, but I'm not completely sure. I see some regression tests are failing, but I'm not sure if it is because I broke pickling or because I need to update pickle references somewhere because I bumped the version number

The cartridge test is failing because it includes the actual pickle output in the test results. This is suboptimal since it causes those tests to fail whenever the pickle version number is changed. We should update those tests to not have this happen, but as far as this PR goes, the solution is to update the expected results file Code/PgSQL/rdkit/expected/xqm.out with the new values. I think you should be able to see them here: https://dev.azure.com/rdkit-builds/RDKit/_build/results?buildId=7748&view=logs&jobId=072c6a9a-9194-51d9-7744-c167ca2015ce&j=072c6a9a-9194-51d9-7744-c167ca2015ce&t=3e9caea3-19a4-558e-d3d8-93be150120e7

If you don't have access to that page, let me know and I can do a PR against your branch with the changes.

greglandrum avatar Oct 28 '25 20:10 greglandrum

@greglandrum Just updated the cartridge test, but now I am seeing this failure on the CI -- is this random?

======================================================================
FAIL: testNonUniqueCrash (__main__.TestCase.testNonUniqueCrash)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vsts/work/1/s/Code/SimDivPickers/Wrap/testPickers.py", line 141, in testNonUniqueCrash
    self.assertTrue(tuple(mm2) != tuple(mm1)) or (tuple(mm3) != tuple(mm1))
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: False is not true

Question for @tadhurst-cdd -- do you think the fields added in AtomMonomerInfo here are sufficient for what you will need for represent an 'SCSRMol'?

rachelnwalker avatar Nov 18 '25 06:11 rachelnwalker

@greglandrum Just updated the cartridge test, but now I am seeing this failure on the CI -- is this random?

======================================================================
FAIL: testNonUniqueCrash (__main__.TestCase.testNonUniqueCrash)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/vsts/work/1/s/Code/SimDivPickers/Wrap/testPickers.py", line 141, in testNonUniqueCrash
    self.assertTrue(tuple(mm2) != tuple(mm1)) or (tuple(mm3) != tuple(mm1))
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: False is not true

yep, random

greglandrum avatar Nov 18 '25 16:11 greglandrum