About ligand poses file naming and actual ligand PDB Ids
I am trying to understand the CrossDocked2020 dataset. In the dataset there are multiple ligands poses associated with a given receptor. For example, I am showing receptor and ligand file paths associated with 3gan, chain A receptor and cps ligand molecule.
I have attached interactive visualisation of pocket-ligand pairs here: 3gan-A.html
And overlapping visualisation of the different receptor files corresponding to 3gan and overlapping visualisation of cps ligand files provided.
From the visualisation of the pockets, they have some degree of overlap;
But for the molecules, even though they have the same PDB ID in their file name, the size and structure of the files are different, ranging from 14 to 29 as shown below. The bonds and atom types are not consistent in the different molecules/poses.
Question: Because the ligand name of all these files is the same and If these poses were generated by AutoDock Vina, aren't they supposed have the same number of atoms, but different geometry?
Sample file names visualised in the above images.
Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_90_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_90.sdf
Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_9_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_9.sdf
Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_196_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_196.sdf
Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_289_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_289.sdf
Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_10_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_10.sdf
Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_162_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_162.sdf
Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_it1_it2_tt_docked_0_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_it1_it2_tt_docked_0.sdf
Receptor:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_286_pocket10.pdb, Ligand:RDM1_ARATH_7_163_0/3gan_A_rec_2q3t_cps_lig_tt_docked_286.sdf
You are not using the CrossDocked files distributed by us. I have checked the files in CrossDock 1.3 distributed here: https://bits.csb.pitt.edu/files/crossdock2020/
All the gninatypes files have the same size (so same number of atoms) and all the sdf files has the same number of atoms.
These seem to be errors propagated from older versions of CrossDock. I strongly recommend using version 1.3.
Thank you @dkoes for the quick response. I am using sampled version of CrossDock2020 dataset reported in https://github.com/pengxingang/Pocket2Mol/tree/main/data. This dataset could have been created from the older version of CrossDock.