Inconsistent resnames when loading pdb and mol2 with repeat resnums

Open zwsmith200 opened this issue 3 years ago • 1 comments

Expected behavior

I am using a PDB file that appears to be prepared incorrectly and includes residue number 200 for both the ligand and a protein residue. When loading the PDB, it appears as if MDA retains the original ligand resname (CNA). I need the fragment functionality so i have prepared a MOL2 file for this system which has the same numbering error. I expect loading this MOL2 file would have the same behavior for the ligand resname as the PDB.

Actual behavior

When I load the MOL2 file the resname for the ligand is replaced by the resname for the 200th protein residue (TRP200).

u = mda.Universe('1a26A.pdb') u.atoms.resnames

u = mda.Universe('1a26A.mol2') u.atoms.resnames

Current version of MDAnalysis

Version 2.1.0 on both MacOS and Linux.

1a26.zip

May 05 '22 20:05 zwsmith200

This appears to be a difference caused by using change_squash for PDB and squash_by for MOL2. Since the ligand is at the end it is accompanied by a resid change from 351 to 200 which prompts it to be squashed to a different residue as opposed to looking at the non-unique resid.

It would be good to have a warning in squash_by for non-sequential resids to avoid someone getting burned by this when using a large database of files.

May 06 '22 22:05 zwsmith200