DiffDock How to reproduce the results in the paper

How can I reproduce the result of 38.2% of Ligand RMSD below 2 \AA as in the paper? I ran the following commands with the same test set and the same conda environment as listed in the repo, but I can only get the result of 36.4%. Here are the commands I ran, and the protein_ligand_example_csv_test.csv is the same as https://github.com/gcorso/DiffDock/blob/main/data/testset_csv.csv:

python datasets/esm_embedding_preparation.py --protein_ligand_csv data/protein_ligand_example_csv_test.csv --out_file data/prepared_for_esm_test.fasta

HOME=../esm/model_weights python ../esm/scripts/extract.py esm2_t33_650M_UR50D data/prepared_for_esm_test.fasta data/esm2_output --repr_layers 33 --include per_tok --truncation_seq_length 4096

python -m inference --protein_ligand_csv data/protein_ligand_example_csv_test.csv --out_dir results/user_predictions_small --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise

Jan 29 '23 12:01 ShuqiLu

Hi, Have you run the model multiple times? There is some variance in the performance due to randomness (38.2% was the mean of the 3 runs we did)

Feb 11 '23 17:02 gcorso

Hi, Have you run the model multiple times? There is some variance in the performance due to randomness (38.2% was the mean of the 3 runs we did)

By 3 runs I assume you mean 3 independent runs with 40 samples per complex each. So are the 40 samples per complex not independently sampled? Do you take the mean of all 120 samples or mean of the best 3?

Feb 11 '23 23:02 RJ3

Yes, 3 independent runs with 40 samples per complex each

Feb 11 '23 23:02 gcorso

yes I have run the model 3 times and the results are as follows: 36.36, 37.11, 36.39

Feb 15 '23 09:02 ShuqiLu

Strange, this seems to suggest some differences in performances. Where did you download the PDBBind dataset? Maybe it could also be a difference caused by a different version of some library but I would not be able to pinpoint it to a specific one.

Feb 15 '23 14:02 gcorso

we download the data in https://zenodo.org/record/6408497#.Y_MDMuxBy3I.

Feb 20 '23 05:02 ShuqiLu

Hi, were you able to get some closure on the matter?

May 17 '23 00:05 sw5park

Hi @ShuqiLu (and @sw5park), We recently found out something that might be useful/explain the difference. When I updated the version of pytorch geometric from 2.0.4 (in the original environment we use to reproduce results in the paper) to 2.2.0 we observe a reduction in performance of approximately 2% which would correspond to what you are experiencing. I don't know exactly what happens between these two versions of PyG (or whether the issue is with one of the connected packages that get also updated) and moving forward we will try to use docker environments for better reproducibility. I'll update the instructions on the readme and it would be great if you could verify that changing the version of this package explains the difference. Thank you, Gabriele

Jun 16 '23 08:06 gcorso

DiffDock DiffDock copied to clipboard

How to reproduce the results in the paper

DiffDock
DiffDock copied to clipboard