models icon indicating copy to clipboard operation
models copied to clipboard

Odd data in PDBBind2016

Open hnisonoff opened this issue 2 years ago • 14 comments

I wanted to flag some oddities with the PDBBind2016 dataset. I've tried to recompute the RMSDs and have noticed a very large fraction do not match the data. One particularly odd example I found was in 5c28 where the docked ligand is a different molecule from the crystal ligand. Is there by any chance a cleaner version of the PDBBind docked dataset that could be used?

hnisonoff avatar Jul 17 '22 00:07 hnisonoff

The data we utilized is downloaded directly from the PDB at the time of the data creation, we only took the affinity numbers from the PDBbind data and were matching PDB+ligname from the PDBbind to what was identified via pocketome.

Could you give more information about this example (Pocket & specific files)?

francoep avatar Jul 26 '22 21:07 francoep

Or did I misunderstand, and you meant exactly the PDBbind2016 data?

francoep avatar Jul 26 '22 21:07 francoep

Additionally, looking at the 5c28 example you mentioned, the molecules are the same. Could you provide more examples of faulty data, and also describe how you are trying to re-compute the RMSDs?

francoep avatar Jul 27 '22 14:07 francoep

I am describing the 5c28 example from PDBbind2016.tar.gz. I just double check by redownloading everything.

Here is an image showing what happens when I load the docked molecule and the one labeled as ligand. As you can see they are different.

image

hnisonoff avatar Jul 27 '22 19:07 hnisonoff

Here is the directory. Files are 5c28_docked.sdf and 5c28_ligand.sdf 5c28.zip

hnisonoff avatar Jul 27 '22 19:07 hnisonoff

Doesn't look different to me. Looks like the same molecule rotated 180 degrees.

dkoes avatar Jul 27 '22 19:07 dkoes

Sorry you are right. I apologize for the inconvenience. For RMSDs I was using rdkit CalcRMS and spyrmsd. I'll go back and fill this out with more detail. Sorry again for the incorrect bug report.

hnisonoff avatar Jul 27 '22 20:07 hnisonoff

For the RMSDs that we reported, we used obrms to calculate them (comes when you install openbabel).

francoep avatar Jul 27 '22 20:07 francoep

CalcRMS does not do symmetry correction. spyrmsd is suppose to. I'd be interested in seeing examples where obrms and sprmsd differ.

dkoes avatar Jul 27 '22 20:07 dkoes

I'll try to dig up examples:

For CalcRMS, the documentation says: "Note: This function will attempt to align all permutations of matching atom orders in both molecules"

Doesn't this imply it does symmetry correction?

hnisonoff avatar Jul 27 '22 20:07 hnisonoff

Huh, you're right - that's what the documentation says. Not sure why there is also GetBestRMS then. Maybe historically it CalcRMS didn't do that?

dkoes avatar Jul 27 '22 20:07 dkoes

GetBestRMS calculates RMSD and then aligns them in space. CalcRMS calculates the RMSD without moving either molecule.

drewnutt avatar Jul 28 '22 20:07 drewnutt

I suppose that 'obrms' also calculates the RMSD without moving either molecule?

JonasLi-19 avatar Jul 21 '23 06:07 JonasLi-19

Yes, unless -m is passed.

dkoes avatar Jul 21 '23 11:07 dkoes