models icon indicating copy to clipboard operation
models copied to clipboard

About ligand poses file naming and actual ligand PDB Ids

Open yemanbh opened this issue 3 months ago • 3 comments

I am trying to understand the CrossDocked2020 dataset. In the dataset there are multiple ligands poses associated with a given receptor. For example, I am showing receptor and ligand file paths associated with 3gan, chain A receptor and cps ligand molecule.

I have attached interactive visualisation of pocket-ligand pairs here: 3gan-A.html

And overlapping visualisation of the different receptor files corresponding to 3gan and overlapping visualisation of cps ligand files provided. Image Image

From the visualisation of the pockets, they have some degree of overlap;

But for the molecules, even though they have the same PDB ID in their file name, the size and structure of the files are different, ranging from 14 to 29 as shown below. The bonds and atom types are not consistent in the different molecules/poses.

Image

Question: Because the ligand name of all these files is the same and If these poses were generated by AutoDock Vina, aren't they supposed have the same number of atoms, but different geometry?

Sample file names visualised in the above images.

Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_90_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_90.sdf

Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_9_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_9.sdf

Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_196_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_196.sdf

Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_289_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_289.sdf

Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_10_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_10.sdf

Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_162_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_162.sdf

Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_it1_it2_tt_docked_0_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_it1_it2_tt_docked_0.sdf

Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_286_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_286.sdf

yemanbh avatar Sep 19 '25 15:09 yemanbh

You are not using the CrossDocked files distributed by us. I have checked the files in CrossDock 1.3 distributed here: https://bits.csb.pitt.edu/files/crossdock2020/

All the gninatypes files have the same size (so same number of atoms) and all the sdf files has the same number of atoms.

dkoes avatar Sep 20 '25 22:09 dkoes

These seem to be errors propagated from older versions of CrossDock. I strongly recommend using version 1.3.

dkoes avatar Sep 20 '25 23:09 dkoes

Thank you @dkoes for the quick response. I am using sampled version of CrossDock2020 dataset reported in https://github.com/pengxingang/Pocket2Mol/tree/main/data. This dataset could have been created from the older version of CrossDock.

yemanbh avatar Sep 22 '25 10:09 yemanbh