The predicted values are nan
I set up alphafold without docker on our server and ran alphafold with A100 GPU. During relaxation, "simtk.openmm.OpenMMException: Particle coordinate is nan" error occurred as below
I0811 17:57:58.746377 140681736556736 run_alphafold.py:141] Running model model_1
I0811 17:58:21.323568 140681736556736 model.py:132] Running predict with shape(feat) = {'aatype': (4, 68), 'residue_index': (4, 68), 'seq_length': (4,), 'template_aatype': (4, 4, 68), 'template_all_atom_masks': (4, 4, 68, 37), 'template_all_atom_positions': (4, 4, 68, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 68), 'msa_mask': (4, 508, 68), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 68, 3), 'template_pseudo_beta_mask': (4, 4, 68), 'atom14_atom_exists': (4, 68, 14), 'residx_atom14_to_atom37': (4, 68, 14), 'residx_atom37_to_atom14': (4, 68, 37), 'atom37_atom_exists': (4, 68, 37), 'extra_msa': (4, 5120, 68), 'extra_msa_mask': (4, 5120, 68), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 68), 'true_msa': (4, 508, 68), 'extra_has_deletion': (4, 5120, 68), 'extra_deletion_value': (4, 5120, 68), 'msa_feat': (4, 508, 68, 49), 'target_feat': (4, 68, 22)}
I0811 18:02:36.754542 140681736556736 model.py:140] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (68, 68, 64)}, 'experimentally_resolved': {'logits': (68, 37)}, 'masked_msa': {'logits': (508, 68, 23)}, 'predicted_lddt': {'logits': (68, 50)}, 'structure_module': {'final_atom_mask': (68, 37), 'final_atom_positions': (68, 37, 3)}, 'plddt': (68,)}
I0811 18:02:36.765380 140681736556736 run_alphafold.py:153] Total JAX model model_1 predict time (includes compilation time, see --benchmark): 255?
Traceback (most recent call last):
File "/home/dearfold/alphafold/run_alphafold.py", line 302, in <module>
app.run(main)
File "/home/dearfold/anaconda3/envs/alphafold/lib/python3.7/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/dearfold/anaconda3/envs/alphafold/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/dearfold/alphafold/run_alphafold.py", line 284, in main
random_seed=random_seed)
File "/home/dearfold/alphafold/run_alphafold.py", line 177, in predict_structure
relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein)
File "/home/dearfold/alphafold/alphafold/relax/relax.py", line 62, in process
max_outer_iterations=self._max_outer_iterations)
File "/home/dearfold/alphafold/alphafold/relax/amber_minimize.py", line 461, in run_pipeline
pdb_string = clean_protein(prot, checks=checks)
File "/home/dearfold/alphafold/alphafold/relax/amber_minimize.py", line 171, in clean_protein
fixed_pdb = cleanup.fix_pdb(pdb_file, alterations_info)
File "/home/dearfold/alphafold/alphafold/relax/cleanup.py", line 55, in fix_pdb
fixer.addMissingAtoms(seed=0)
File "/home/dearfold/anaconda3/envs/alphafold/lib/python3.7/site-packages/pdbfixer/pdbfixer.py", line 954, in addMissingAtoms
mm.LocalEnergyMinimizer.minimize(context)
File "/home/dearfold/anaconda3/envs/alphafold/lib/python3.7/site-packages/simtk/openmm/openmm.py", line 4110, in minimize
return _openmm.LocalEnergyMinimizer_minimize(context, tolerance, maxIterations)
simtk.openmm.OpenMMException: Particle coordinate is nan
These are files in the output directory.
features.pkl msas result_model_1.pkl unrelaxed_model_1.pdb
I checked unrelaxed_model_1.pdb, and found that atom coordinates are written as nan. The below is the part of unrelaxed_model_1.pdb.
MODEL 1
ATOM 1 N GLY A 1 nan nan nan 1.00 0.00 N
ATOM 2 CA GLY A 1 nan nan nan 1.00 0.00 C
ATOM 3 C GLY A 1 nan nan nan 1.00 0.00 C
ATOM 4 O GLY A 1 nan nan nan 1.00 0.00 O
ATOM 5 N TRP A 2 nan nan nan 1.00 0.00 N
ATOM 6 CA TRP A 2 nan nan nan 1.00 0.00 C
ATOM 7 C TRP A 2 nan nan nan 1.00 0.00 C
ATOM 8 CB TRP A 2 nan nan nan 1.00 0.00 C
ATOM 9 O TRP A 2 nan nan nan 1.00 0.00 O
ATOM 10 CG TRP A 2 nan nan nan 1.00 0.00 C
ATOM 11 CD1 TRP A 2 nan nan nan 1.00 0.00 C
ATOM 12 CD2 TRP A 2 nan nan nan 1.00 0.00 C
ATOM 13 CE2 TRP A 2 nan nan nan 1.00 0.00 C
ATOM 14 CE3 TRP A 2 nan nan nan 1.00 0.00 C
ATOM 15 NE1 TRP A 2 nan nan nan 1.00 0.00 N
ATOM 16 CH2 TRP A 2 nan nan nan 1.00 0.00 C
ATOM 17 CZ2 TRP A 2 nan nan nan 1.00 0.00 C
ATOM 18 CZ3 TRP A 2 nan nan nan 1.00 0.00 C
ATOM 19 N SER A 3 nan nan nan 1.00 0.00 N
ATOM 20 CA SER A 3 nan nan nan 1.00 0.00 C
ATOM 21 C SER A 3 nan nan nan 1.00 0.00 C
ATOM 22 CB SER A 3 nan nan nan 1.00 0.00 C
ATOM 23 O SER A 3 nan nan nan 1.00 0.00 O
ATOM 24 OG SER A 3 nan nan nan 1.00 0.00 O
ATOM 25 N THR A 4 nan nan nan 1.00 0.00 N
ATOM 26 CA THR A 4 nan nan nan 1.00 0.00 C
ATOM 27 C THR A 4 nan nan nan 1.00 0.00 C
ATOM 28 CB THR A 4 nan nan nan 1.00 0.00 C
ATOM 29 O THR A 4 nan nan nan 1.00 0.00 O
ATOM 30 CG2 THR A 4 nan nan nan 1.00 0.00 C
ATOM 31 OG1 THR A 4 nan nan nan 1.00 0.00 O
So, I loaded the result_model_1.pkl file as dictionary and found that the predicted values are also nan.
Python 3.7.11 (default, Jul 27 2021, 14:32:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle as pkl
>>> with open("result_model_1.pkl", "rb") as f:
... d=pkl.load(f)
...
>>> d
{'distogram': {'bin_edges': array([ 2.3125 , 2.625 , 2.9375 , 3.25 , 3.5625 ,
3.875 , 4.1875 , 4.5 , 4.8125 , 5.125 ,
5.4375 , 5.75 , 6.0625 , 6.375 , 6.6875 ,
6.9999995, 7.3125 , 7.625 , 7.9375 , 8.25 ,
8.5625 , 8.875 , 9.1875 , 9.5 , 9.812499 ,
10.124999 , 10.4375 , 10.75 , 11.0625 , 11.375 ,
11.687499 , 12. , 12.3125 , 12.625 , 12.9375 ,
13.25 , 13.5625 , 13.874999 , 14.187501 , 14.499999 ,
14.812499 , 15.124999 , 15.437499 , 15.75 , 16.0625 ,
16.375 , 16.687502 , 16.999998 , 17.312498 , 17.624998 ,
17.937498 , 18.25 , 18.5625 , 18.875 , 19.1875 ,
19.5 , 19.8125 , 20.125 , 20.437498 , 20.75 ,
21.062498 , 21.374998 , 21.6875 ], dtype=float32), 'logits': array([[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
...,
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)}, 'experimentally_resolved': {'logits': array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], dtype=float32)}, 'masked_msa': {'logits': array([[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
...,
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)}, 'predicted_lddt': {'logits': array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], dtype=float32)}, 'structure_module': {'final_atom_mask': array([[1., 1., 1., ..., 0., 0., 0.],
[1., 1., 1., ..., 1., 0., 0.],
[1., 1., 1., ..., 0., 0., 0.],
...,
[1., 1., 1., ..., 0., 1., 0.],
[1., 1., 1., ..., 0., 0., 0.],
[1., 1., 1., ..., 0., 0., 0.]], dtype=float32), 'final_atom_positions': array([[[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
...,
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]],
[[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
...,
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]],
[[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
...,
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]],
...,
[[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
...,
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]],
[[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
...,
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]],
[[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
...,
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]]], dtype=float32)}, 'plddt': array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan])}
I checked that the feature.pkl file is okay and parameters for model are loaded well. I tested different sequences, but the predicted values were always nan. I guess something went wrong during prediction, but I cannot figure out what is wrong and how to fix it. Has anyone faced the same issue? Can anyone help me to fix it?
Is this still an issue with the latest version of AlphaFold? Also, does it help to run without relax (--run_relax=false)?
When I run with --run_relax=false I don't get the error anymore.
However, I noticed another problem when I run with --run_relax=false . For example, with the attached fasta file, when I run with --run_relax=false the rank_0 model I get will be all nan (as attached). Some other models (rank_1, rank_2...) may have valid atom coordinates. And also check the ranking_debug.json, there is some nan value.
Error.zip
Hi, thanks for the additional information, we will investigate and let you know.
Any update on this topic? I am getting the same error using the multimer protocol
Is this an issue for all 5 predictions or just some of them?
just some of them
It only produced one model and stopped
Sent from my iPhone
On May 25, 2022, at 3:05 PM, Giangpth @.***> wrote:
just some of them
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.
Same issue for --model_preset=multimer, --use_gpu_relax=True with v2.2.0.
This is also the case when --use_gpu_relax=False.
Both stop at
simtk.openmm.OpenMMException: Particle coordinate is nan
I just got approval that I can share the following sequences for debugging purposes. In my case, this nan related bug happened for:
$ cat MPK4_MKK2_Docking.fasta
>MPK4
MSAESCFGSSGDQSSSKGVATHGGSYVQYNVYGNLFEVSRKYVPPLRPIGRGAYGIVCAATNSETGEEVAIKKIGNAFDNIIDAKRTLREIKLLKHMDHENVIAVKDIIKPPQRENFNDVYIVYELMDTDLHQIIRSNQPLTDDHCRFFLYQLLRGLKYVHSANVLHRDLKPSNLLLNANCDLKLGDFGLARTKSETDFMTEYVVTRWYRAPELLLNCSEYTAAIDIWSVGCILGETMTREPLFPGKDYVHQLRLITELIGSPDDSSLGFLRSDNARRYVRQLPQYPRQNFAARFPNMSAGAVDLLEKMLVFDPSRRITVDEALCHPYLAPLHDINEEPVCVRPFNFDFEQPTLTEENIKELIYRETVKFNPQDSV
>MKK2-Docking
MKKGGFSNNLKLAIPVAGE
$ cat run_multimer.sh
#!/bin/bash
## https://sbgrid.org/wiki/examples/alphafold2
### Tips: https://wiki.hpcc.msu.edu/display/ITH/Alphafold
#SBATCH -N 1
#SBATCH --partition=batch
#SBATCH -J AlphaFold.version2.2
#SBATCH -o AlphaFold.v2.2.%J.out
#SBATCH -e AlphaFold.v2.2.%J.err
#SBATCH [email protected]
#SBATCH --mail-type=ALL
#SBATCH --time=24:00:00
#SBATCH --mem=64G
#SBATCH --gres=gpu:4
#SBATCH --cpus-per-task=32
#SBATCH --constraint=[a100]
module load alphafold/2.2.0/python3_jupyter
export ALPHAFOLD_DATA=/reference/alphafold/2.1.1/all_alphafold_data
export CUDA_VISIBLE_DEVICES=0,1,2,3
export TF_FORCE_UNIFIED_MEMORY=1
export XLA_PYTHON_CLIENT_MEM_FRACTION=0.5
export XLA_PYTHON_CLIENT_ALLOCATOR=platform
python3 $AlphaFold/run_alphafold.py \
--data_dir=$ALPHAFOLD_DATA \
--output_dir=/af2_multimer_run/MPK4_MKK2_Docking \
--fasta_paths=/af2_multimer_run/MPK4_MKK2_Docking/MPK4_MKK2_Docking.fasta \
--max_template_date=2022-05-25 \
--db_preset=full_dbs \
--bfd_database_path=$ALPHAFOLD_DATA/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniclust30_database_path=$ALPHAFOLD_DATA/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--uniref90_database_path=$ALPHAFOLD_DATA/uniref90/uniref90.fasta \
--mgnify_database_path=$ALPHAFOLD_DATA/mgnify/mgy_clusters_2018_12.fa \
--template_mmcif_dir=$ALPHAFOLD_DATA/pdb_mmcif/mmcif_files \
--model_preset=multimer \
--uniprot_database_path=$ALPHAFOLD_DATA/uniprot/uniprot.fasta \
--pdb_seqres_database_path=$ALPHAFOLD_DATA/pdb_seqres/pdb_seqres.txt \
--obsolete_pdbs_path=$ALPHAFOLD_DATA/pdb_mmcif/obsolete.dat \
--use_gpu_relax=True
--use_gpu_relax=False is also facing nan issue.
use_gpu_relax=Falseis also facingnanissue.
Is not amber suported on both CPU and GPU? Try actual run_relax=false.
It looks like this problem can be fixed by making a small change, which is necessary when you're using jax 0.3.8 or newer, see #513
This has been fixed in https://github.com/deepmind/alphafold/releases/tag/v2.2.4. Closing this issue, feel free to reopen this issue or open a new issue if this is still a problem.