DiffDock
DiffDock copied to clipboard
I have a question.
Hello I'm JeongSoo Na Syntekabio in South Korea. I have a question about how to use it. When I wrote this command,
python3 -m inference --protein_ligand_csv /tmp/input_protein_ligand.csv --out_dir results/6VL4 --inference_steps 20 --samples_per_complex 1 --batch_size 6
This error has occurred.
loading data from memory: data/cache_torsion/limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_esmEmbeddings3164946661/heterographs.pkl Number of complexes: 0 /usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py:3474: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) /usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py:264: RuntimeWarning: Degrees of freedom <= 0 for slice ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof, /usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py:222: RuntimeWarning: invalid value encountered in true_divide arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe', /usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py:256: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/content/DiffDock/inference.py", line 81, in <module> test_dataset = PDBBind(transform=None, root='', protein_path_list=protein_path_list, ligand_descriptions=ligand_descriptions, File "/content/DiffDock/datasets/pdbbind.py", line 111, in __init__ print_statistics(self.complex_graphs) File "/content/DiffDock/datasets/pdbbind.py", line 361, in print_statistics print(f"{name[i]}: mean {np.mean(array)}, std {np.std(array)}, max {np.max(array)}") File "<__array_function__ internals>", line 180, in amax File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 2791, in amax return _wrapreduction(a, np.maximum, 'max', axis, None, out, File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) ValueError: zero-size array to reduction operation maximum which has no identity
The 6agt target of the test set used by Google Colab worked normally, but this error occurred when using a different pdb structure as the target. Can you help me?
Hi @JeongSooNa ,
Just some general debugging questions/advice:
- Do other proteins (not from the test set) work for you? Maybe there was some error when converting the pdb to ESM embeddings
- Are you using smiles for ligands, or files? Sometimes rdkit can struggle with reading (mol2) files correctly
These are some errors I came across, so maybe something similar is happening in your case. If you want you could also provide me the pdb and ligands you're using and I can try to run it on my system to see if it works or if I get the same error. Feel free to send it to my email ([email protected]) if you don't want to have that data completely public!
Kind regards, Jochem
I'll send you an email. Thank you for your kind answer. 😄
Hi @JeongSooNa ,
This error occurs because the preprocessing for all your complexes has failed. To get better insight into the issue you can remove the cache and add a raise e
after the except
in the inference file. This should fail and give you a full stack trace that should help identify the problem!
Gabriele
Thank U @gcorso ,
I understand a little bit. But I don't know how to remove cache and add a raise e after the except in the inference file. :( Do I delete all the cache on the server? Or need to modify a certain part of the inference file?
Poor Junior JeongSoo Na
@JeongSooNa The cache is located in data/cache_torsion/ and data/cache_torsion_allatoms/ If you remove the content of those folders, the cache is removed!
Thank you @Jnelen,
I deleted Cache and added raise e
to the exception part of the script and tried again, but ValueError was the same and the following error message was displayed. :(
root@72bcd36a4001:/content/DiffDock# python3 -m inference --protein_path /JUST_DO_IT/input/PDB/6vl4_3chain.pdb --ligand /JUST_DO_IT/input/LIGAND/QY1.sdf --out_dir /JUST_DO_IT/output/6vl4_1 --inference_steps 20 --samples_per_complex 1 --batch_size 6 Reading molecules and generating local structures with RDKit 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 14.46it/s] Reading language model embeddings. Generating graphs for ligands and proteins loading complexes: 0%| | 0/1 [00:00<?, ?it/s]Skipping /JUST_DO_IT/input/PDB/6vl4_3chain.pdb____/JUST_DO_IT/input/LIGAND/QY1.sdf because of the error: Encountered valid chain id that was not present in the LM embeddings loading complexes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 8.42it/s] loading data from memory: data/cache_torsion/limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_esmEmbeddings4184974272/heterographs.pkl Number of complexes: 0 /usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py:3474: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) /usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py:264: RuntimeWarning: Degrees of freedom <= 0 for slice ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof, /usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py:222: RuntimeWarning: invalid value encountered in true_divide arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe', /usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py:256: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/content/DiffDock/inference.py", line 81, in <module> test_dataset = PDBBind(transform=None, root='', protein_path_list=protein_path_list, ligand_descriptions=ligand_descriptions, #This line is Value Error File "/content/DiffDock/datasets/pdbbind.py", line 111, in __init__ print_statistics(self.complex_graphs) ### this line is important of error File "/content/DiffDock/datasets/pdbbind.py", line 361, in print_statistics print(f"{name[i]}: mean {np.mean(array)}, std {np.std(array)}, max {np.max(array)}") File "<__array_function__ internals>", line 180, in amax File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 2791, in amax return _wrapreduction(a, np.maximum, 'max', axis, None, out, File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) ValueError: zero-size array to reduction operation maximum which has no identity
Is there a problem with my server's environment or version?
Or in the case of 5 agt, it's working normally
I think this part is problem
Encountered valid chain id that was not present in the LM embeddings
Is it necessary to check or modify the structures of the formalized PDB structure? Then I wonder how.
JeongSoo Na
Hi @JeongSooNa,
The error you mention:
Encountered valid chain id that was not present in the LM embeddings
I think this happens because the esm embedding generation failed. I assume something is wrong with your pdb file. As for the pdb preparation structure: I usually remove all waters and ligands. In case of multichain proteins where the active site is not formed between the different chains, I usually only keep one chain. I also prepare the structure using Maestro's protein preparation wizard to fix any missing residues and add missing hydrogens where necessary.
Kind regards, Jochem
Hi @JeongSooNa,
could you try to redo the same process but removing the underscore from the title of the receptor (i.e. 6vl4-3chain instead of 6vl4_3chain)?
Gabriele
Hi @Jnelen
I'm sure there's a problem with the pdb file.
However, when I used the 3DWW structure you sent me via email, the following error occurred again. This is a structural problem, but there seems to be a difference between the server environment and the tools installed.
root@72bcd36a4001:/content/DiffDock# python3 -m inference --protein_path /JUST_DO_IT/input/PDB/3DWW.pdb --ligand "c1ccc2c(c1)ccc(n2)COc3ccc(cc3)[C@@H](C4CCCC4)C(=O)O" --out_dir /JUST_DO_IT/output/3DWW --inference_steps 20 --samples_per_complex 1 --batch_size 6 Reading molecules and generating local structures with RDKit 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 14.25it/s] Reading language model embeddings. Generating graphs for ligands and proteins loading complexes: 0%| | 0/1 [00:00<?, ?it/s]Skipping /JUST_DO_IT/input/PDB/3DWW.pdb____c1ccc2c(c1)ccc(n2)COc3ccc(cc3)[C@@H](C4CCCC4)C(=O)O because of the error: Encountered valid chain id that was not present in the LM embeddings loading complexes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.01it/s] loading data from memory: data/cache_torsion/limit0_INDEX_maxLigSizeNone_H0_recRad15.0_recMax24_esmEmbeddings1700137435/heterographs.pkl Number of complexes: 0 /usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py:3474: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) /usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py:264: RuntimeWarning: Degrees of freedom <= 0 for slice ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof, /usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py:222: RuntimeWarning: invalid value encountered in true_divide arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe', /usr/local/lib/python3.8/dist-packages/numpy/core/_methods.py:256: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/content/DiffDock/inference.py", line 81, in <module> test_dataset = PDBBind(transform=None, root='', protein_path_list=protein_path_list, ligand_descriptions=ligand_descriptions, #This line is Value Error File "/content/DiffDock/datasets/pdbbind.py", line 111, in __init__ print_statistics(self.complex_graphs) ### this line is important of error File "/content/DiffDock/datasets/pdbbind.py", line 361, in print_statistics print(f"{name[i]}: mean {np.mean(array)}, std {np.std(array)}, max {np.max(array)}") File "<__array_function__ internals>", line 180, in amax File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 2791, in amax return _wrapreduction(a, np.maximum, 'max', axis, None, out, File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) ValueError: zero-size array to reduction operation maximum which has no identity
First, I will modify the pdb file that I used and try again.
I also wonder if I should remove the distinction of the chain when modifying the pdb file.
ex)
ATOM 6725 CD GLN C 143 -12.713 -18.835 -6.789 1.00 3.02 A C
to
ATOM 6725 CD GLN C 143 -12.713 -18.835 -6.789 1.00 3.02 C
Thank you for your response!
Hello @gcorso ,
I tried to do it as you advised, but it seems that there is a problem with the structure of the pdb file as the same issue occurred. To solve it, I will try to perform it by modifying the pdb structure.
Thank you every time!
JeongSoo Na
Hi @JeongSooNa
That structure did not give me any problems. Did it generate any error the first time you ran it? That is when the LM embeddings are generated (if they aren't present yet). Also, did you install the fair-esm and openfold packages correctly? Maybe that could be the cause of the problem? As for the different chains, normally it should recognize this and generate seperate embeddings for each one. I haven't had a problem with this before, so i'm not too sure what's causing the problem.
Kind regards, Jochem