DiffDock
DiffDock copied to clipboard
How to reproduce the results in the paper
How can I reproduce the result of 38.2% of Ligand RMSD below 2 \AA as in the paper? I ran the following commands with the same test set and the same conda environment as listed in the repo, but I can only get the result of 36.4%. Here are the commands I ran, and the protein_ligand_example_csv_test.csv is the same as https://github.com/gcorso/DiffDock/blob/main/data/testset_csv.csv:
python datasets/esm_embedding_preparation.py --protein_ligand_csv data/protein_ligand_example_csv_test.csv --out_file data/prepared_for_esm_test.fasta
HOME=../esm/model_weights python ../esm/scripts/extract.py esm2_t33_650M_UR50D data/prepared_for_esm_test.fasta data/esm2_output --repr_layers 33 --include per_tok --truncation_seq_length 4096
python -m inference --protein_ligand_csv data/protein_ligand_example_csv_test.csv --out_dir results/user_predictions_small --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise
Hi, Have you run the model multiple times? There is some variance in the performance due to randomness (38.2% was the mean of the 3 runs we did)
Hi, Have you run the model multiple times? There is some variance in the performance due to randomness (38.2% was the mean of the 3 runs we did)
By 3 runs I assume you mean 3 independent runs with 40 samples per complex each. So are the 40 samples per complex not independently sampled? Do you take the mean of all 120 samples or mean of the best 3?
Yes, 3 independent runs with 40 samples per complex each
yes I have run the model 3 times and the results are as follows: 36.36, 37.11, 36.39
Strange, this seems to suggest some differences in performances. Where did you download the PDBBind dataset? Maybe it could also be a difference caused by a different version of some library but I would not be able to pinpoint it to a specific one.
we download the data in https://zenodo.org/record/6408497#.Y_MDMuxBy3I.
Hi, were you able to get some closure on the matter?
Hi @ShuqiLu (and @sw5park), We recently found out something that might be useful/explain the difference. When I updated the version of pytorch geometric from 2.0.4 (in the original environment we use to reproduce results in the paper) to 2.2.0 we observe a reduction in performance of approximately 2% which would correspond to what you are experiencing. I don't know exactly what happens between these two versions of PyG (or whether the issue is with one of the connected packages that get also updated) and moving forward we will try to use docker environments for better reproducibility. I'll update the instructions on the readme and it would be great if you could verify that changing the version of this package explains the difference. Thank you, Gabriele