Inconsistent resnames when loading pdb and mol2 with repeat resnums
Expected behavior
I am using a PDB file that appears to be prepared incorrectly and includes residue number 200 for both the ligand and a protein residue. When loading the PDB, it appears as if MDA retains the original ligand resname (CNA). I need the fragment functionality so i have prepared a MOL2 file for this system which has the same numbering error. I expect loading this MOL2 file would have the same behavior for the ligand resname as the PDB.
Actual behavior
When I load the MOL2 file the resname for the ligand is replaced by the resname for the 200th protein residue (TRP200).
u = mda.Universe('1a26A.pdb') u.atoms.resnames
u = mda.Universe('1a26A.mol2') u.atoms.resnames
Current version of MDAnalysis
Version 2.1.0 on both MacOS and Linux.
This appears to be a difference caused by using change_squash for PDB and squash_by for MOL2. Since the ligand is at the end it is accompanied by a resid change from 351 to 200 which prompts it to be squashed to a different residue as opposed to looking at the non-unique resid.
It would be good to have a warning in squash_by for non-sequential resids to avoid someone getting burned by this when using a large database of files.