alphafold icon indicating copy to clipboard operation
alphafold copied to clipboard

The predicted values are nan

Open nimijkrap opened this issue 4 years ago • 12 comments

I set up alphafold without docker on our server and ran alphafold with A100 GPU. During relaxation, "simtk.openmm.OpenMMException: Particle coordinate is nan" error occurred as below

I0811 17:57:58.746377 140681736556736 run_alphafold.py:141] Running model model_1
I0811 17:58:21.323568 140681736556736 model.py:132] Running predict with shape(feat) = {'aatype': (4, 68), 'residue_index': (4, 68), 'seq_length': (4,), 'template_aatype': (4, 4, 68), 'template_all_atom_masks': (4, 4, 68, 37), 'template_all_atom_positions': (4, 4, 68, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 68), 'msa_mask': (4, 508, 68), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 68, 3), 'template_pseudo_beta_mask': (4, 4, 68), 'atom14_atom_exists': (4, 68, 14), 'residx_atom14_to_atom37': (4, 68, 14), 'residx_atom37_to_atom14': (4, 68, 37), 'atom37_atom_exists': (4, 68, 37), 'extra_msa': (4, 5120, 68), 'extra_msa_mask': (4, 5120, 68), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 68), 'true_msa': (4, 508, 68), 'extra_has_deletion': (4, 5120, 68), 'extra_deletion_value': (4, 5120, 68), 'msa_feat': (4, 508, 68, 49), 'target_feat': (4, 68, 22)}
I0811 18:02:36.754542 140681736556736 model.py:140] Output shape was {'distogram': {'bin_edges': (63,), 'logits': (68, 68, 64)}, 'experimentally_resolved': {'logits': (68, 37)}, 'masked_msa': {'logits': (508, 68, 23)}, 'predicted_lddt': {'logits': (68, 50)}, 'structure_module': {'final_atom_mask': (68, 37), 'final_atom_positions': (68, 37, 3)}, 'plddt': (68,)}
I0811 18:02:36.765380 140681736556736 run_alphafold.py:153] Total JAX model model_1 predict time (includes compilation time, see --benchmark): 255?
Traceback (most recent call last):
  File "/home/dearfold/alphafold/run_alphafold.py", line 302, in <module>
    app.run(main)
  File "/home/dearfold/anaconda3/envs/alphafold/lib/python3.7/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/dearfold/anaconda3/envs/alphafold/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/dearfold/alphafold/run_alphafold.py", line 284, in main
    random_seed=random_seed)
  File "/home/dearfold/alphafold/run_alphafold.py", line 177, in predict_structure
    relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein)
  File "/home/dearfold/alphafold/alphafold/relax/relax.py", line 62, in process
    max_outer_iterations=self._max_outer_iterations)
  File "/home/dearfold/alphafold/alphafold/relax/amber_minimize.py", line 461, in run_pipeline
    pdb_string = clean_protein(prot, checks=checks)
  File "/home/dearfold/alphafold/alphafold/relax/amber_minimize.py", line 171, in clean_protein
    fixed_pdb = cleanup.fix_pdb(pdb_file, alterations_info)
  File "/home/dearfold/alphafold/alphafold/relax/cleanup.py", line 55, in fix_pdb
    fixer.addMissingAtoms(seed=0)
  File "/home/dearfold/anaconda3/envs/alphafold/lib/python3.7/site-packages/pdbfixer/pdbfixer.py", line 954, in addMissingAtoms
    mm.LocalEnergyMinimizer.minimize(context)
  File "/home/dearfold/anaconda3/envs/alphafold/lib/python3.7/site-packages/simtk/openmm/openmm.py", line 4110, in minimize
    return _openmm.LocalEnergyMinimizer_minimize(context, tolerance, maxIterations)
simtk.openmm.OpenMMException: Particle coordinate is nan

These are files in the output directory.

features.pkl  msas  result_model_1.pkl  unrelaxed_model_1.pdb

I checked unrelaxed_model_1.pdb, and found that atom coordinates are written as nan. The below is the part of unrelaxed_model_1.pdb.

MODEL     1
ATOM      1  N   GLY A   1         nan     nan     nan  1.00  0.00           N
ATOM      2  CA  GLY A   1         nan     nan     nan  1.00  0.00           C
ATOM      3  C   GLY A   1         nan     nan     nan  1.00  0.00           C
ATOM      4  O   GLY A   1         nan     nan     nan  1.00  0.00           O
ATOM      5  N   TRP A   2         nan     nan     nan  1.00  0.00           N
ATOM      6  CA  TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM      7  C   TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM      8  CB  TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM      9  O   TRP A   2         nan     nan     nan  1.00  0.00           O
ATOM     10  CG  TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     11  CD1 TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     12  CD2 TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     13  CE2 TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     14  CE3 TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     15  NE1 TRP A   2         nan     nan     nan  1.00  0.00           N
ATOM     16  CH2 TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     17  CZ2 TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     18  CZ3 TRP A   2         nan     nan     nan  1.00  0.00           C
ATOM     19  N   SER A   3         nan     nan     nan  1.00  0.00           N
ATOM     20  CA  SER A   3         nan     nan     nan  1.00  0.00           C
ATOM     21  C   SER A   3         nan     nan     nan  1.00  0.00           C
ATOM     22  CB  SER A   3         nan     nan     nan  1.00  0.00           C
ATOM     23  O   SER A   3         nan     nan     nan  1.00  0.00           O
ATOM     24  OG  SER A   3         nan     nan     nan  1.00  0.00           O
ATOM     25  N   THR A   4         nan     nan     nan  1.00  0.00           N
ATOM     26  CA  THR A   4         nan     nan     nan  1.00  0.00           C
ATOM     27  C   THR A   4         nan     nan     nan  1.00  0.00           C
ATOM     28  CB  THR A   4         nan     nan     nan  1.00  0.00           C
ATOM     29  O   THR A   4         nan     nan     nan  1.00  0.00           O
ATOM     30  CG2 THR A   4         nan     nan     nan  1.00  0.00           C
ATOM     31  OG1 THR A   4         nan     nan     nan  1.00  0.00           O

So, I loaded the result_model_1.pkl file as dictionary and found that the predicted values are also nan.

Python 3.7.11 (default, Jul 27 2021, 14:32:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle as pkl
>>> with open("result_model_1.pkl", "rb") as f:
...     d=pkl.load(f)
...
>>> d
{'distogram': {'bin_edges': array([ 2.3125   ,  2.625    ,  2.9375   ,  3.25     ,  3.5625   ,
        3.875    ,  4.1875   ,  4.5      ,  4.8125   ,  5.125    ,
        5.4375   ,  5.75     ,  6.0625   ,  6.375    ,  6.6875   ,
        6.9999995,  7.3125   ,  7.625    ,  7.9375   ,  8.25     ,
        8.5625   ,  8.875    ,  9.1875   ,  9.5      ,  9.812499 ,
       10.124999 , 10.4375   , 10.75     , 11.0625   , 11.375    ,
       11.687499 , 12.       , 12.3125   , 12.625    , 12.9375   ,
       13.25     , 13.5625   , 13.874999 , 14.187501 , 14.499999 ,
       14.812499 , 15.124999 , 15.437499 , 15.75     , 16.0625   ,
       16.375    , 16.687502 , 16.999998 , 17.312498 , 17.624998 ,
       17.937498 , 18.25     , 18.5625   , 18.875    , 19.1875   ,
       19.5      , 19.8125   , 20.125    , 20.437498 , 20.75     ,
       21.062498 , 21.374998 , 21.6875   ], dtype=float32), 'logits': array([[[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       ...,

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)}, 'experimentally_resolved': {'logits': array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)}, 'masked_msa': {'logits': array([[[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       ...,

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)}, 'predicted_lddt': {'logits': array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)}, 'structure_module': {'final_atom_mask': array([[1., 1., 1., ..., 0., 0., 0.],
       [1., 1., 1., ..., 1., 0., 0.],
       [1., 1., 1., ..., 0., 0., 0.],
       ...,
       [1., 1., 1., ..., 0., 1., 0.],
       [1., 1., 1., ..., 0., 0., 0.],
       [1., 1., 1., ..., 0., 0., 0.]], dtype=float32), 'final_atom_positions': array([[[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        ...,
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       [[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        ...,
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       [[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        ...,
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       ...,

       [[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        ...,
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       [[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        ...,
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       [[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        ...,
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]]], dtype=float32)}, 'plddt': array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan])}

I checked that the feature.pkl file is okay and parameters for model are loaded well. I tested different sequences, but the predicted values were always nan. I guess something went wrong during prediction, but I cannot figure out what is wrong and how to fix it. Has anyone faced the same issue? Can anyone help me to fix it?

nimijkrap avatar Aug 11 '21 15:08 nimijkrap

Is this still an issue with the latest version of AlphaFold? Also, does it help to run without relax (--run_relax=false)?

Augustin-Zidek avatar Mar 17 '22 15:03 Augustin-Zidek

When I run with --run_relax=false I don't get the error anymore. However, I noticed another problem when I run with --run_relax=false . For example, with the attached fasta file, when I run with --run_relax=false the rank_0 model I get will be all nan (as attached). Some other models (rank_1, rank_2...) may have valid atom coordinates. And also check the ranking_debug.json, there is some nan value. Error.zip

giangpth avatar Mar 17 '22 17:03 giangpth

Hi, thanks for the additional information, we will investigate and let you know.

Augustin-Zidek avatar Mar 22 '22 16:03 Augustin-Zidek

Any update on this topic? I am getting the same error using the multimer protocol

mcbeaker avatar May 11 '22 21:05 mcbeaker

Is this an issue for all 5 predictions or just some of them?

Augustin-Zidek avatar May 25 '22 13:05 Augustin-Zidek

just some of them

giangpth avatar May 25 '22 19:05 giangpth

It only produced one model and stopped

Sent from my iPhone

On May 25, 2022, at 3:05 PM, Giangpth @.***> wrote:

 just some of them

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

mcbeaker avatar May 25 '22 19:05 mcbeaker

Same issue for --model_preset=multimer, --use_gpu_relax=True with v2.2.0.

RodenLuo avatar May 26 '22 12:05 RodenLuo

This is also the case when --use_gpu_relax=False.

Both stop at

simtk.openmm.OpenMMException: Particle coordinate is nan

RodenLuo avatar May 26 '22 15:05 RodenLuo

I just got approval that I can share the following sequences for debugging purposes. In my case, this nan related bug happened for:

$ cat MPK4_MKK2_Docking.fasta 
>MPK4
MSAESCFGSSGDQSSSKGVATHGGSYVQYNVYGNLFEVSRKYVPPLRPIGRGAYGIVCAATNSETGEEVAIKKIGNAFDNIIDAKRTLREIKLLKHMDHENVIAVKDIIKPPQRENFNDVYIVYELMDTDLHQIIRSNQPLTDDHCRFFLYQLLRGLKYVHSANVLHRDLKPSNLLLNANCDLKLGDFGLARTKSETDFMTEYVVTRWYRAPELLLNCSEYTAAIDIWSVGCILGETMTREPLFPGKDYVHQLRLITELIGSPDDSSLGFLRSDNARRYVRQLPQYPRQNFAARFPNMSAGAVDLLEKMLVFDPSRRITVDEALCHPYLAPLHDINEEPVCVRPFNFDFEQPTLTEENIKELIYRETVKFNPQDSV
>MKK2-Docking
MKKGGFSNNLKLAIPVAGE
$ cat run_multimer.sh
#!/bin/bash
## https://sbgrid.org/wiki/examples/alphafold2
### Tips: https://wiki.hpcc.msu.edu/display/ITH/Alphafold 
#SBATCH -N 1
#SBATCH --partition=batch
#SBATCH -J AlphaFold.version2.2
#SBATCH -o AlphaFold.v2.2.%J.out
#SBATCH -e AlphaFold.v2.2.%J.err
#SBATCH [email protected]
#SBATCH --mail-type=ALL
#SBATCH --time=24:00:00
#SBATCH --mem=64G
#SBATCH --gres=gpu:4
#SBATCH --cpus-per-task=32
#SBATCH --constraint=[a100]


module load alphafold/2.2.0/python3_jupyter
export ALPHAFOLD_DATA=/reference/alphafold/2.1.1/all_alphafold_data
export CUDA_VISIBLE_DEVICES=0,1,2,3
export TF_FORCE_UNIFIED_MEMORY=1
export XLA_PYTHON_CLIENT_MEM_FRACTION=0.5
export XLA_PYTHON_CLIENT_ALLOCATOR=platform
python3 $AlphaFold/run_alphafold.py \
 --data_dir=$ALPHAFOLD_DATA \
 --output_dir=/af2_multimer_run/MPK4_MKK2_Docking \
 --fasta_paths=/af2_multimer_run/MPK4_MKK2_Docking/MPK4_MKK2_Docking.fasta \
 --max_template_date=2022-05-25 \
 --db_preset=full_dbs \
 --bfd_database_path=$ALPHAFOLD_DATA/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
 --uniclust30_database_path=$ALPHAFOLD_DATA/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
 --uniref90_database_path=$ALPHAFOLD_DATA/uniref90/uniref90.fasta \
 --mgnify_database_path=$ALPHAFOLD_DATA/mgnify/mgy_clusters_2018_12.fa \
 --template_mmcif_dir=$ALPHAFOLD_DATA/pdb_mmcif/mmcif_files \
 --model_preset=multimer \
 --uniprot_database_path=$ALPHAFOLD_DATA/uniprot/uniprot.fasta \
 --pdb_seqres_database_path=$ALPHAFOLD_DATA/pdb_seqres/pdb_seqres.txt \
 --obsolete_pdbs_path=$ALPHAFOLD_DATA/pdb_mmcif/obsolete.dat \
 --use_gpu_relax=True

--use_gpu_relax=False is also facing nan issue.

RodenLuo avatar Jun 06 '22 20:06 RodenLuo

use_gpu_relax=False is also facing nan issue.

Is not amber suported on both CPU and GPU? Try actual run_relax=false.

ValZapod avatar Jun 15 '22 09:06 ValZapod

It looks like this problem can be fixed by making a small change, which is necessary when you're using jax 0.3.8 or newer, see #513

boegel avatar Aug 04 '22 14:08 boegel

This has been fixed in https://github.com/deepmind/alphafold/releases/tag/v2.2.4. Closing this issue, feel free to reopen this issue or open a new issue if this is still a problem.

Augustin-Zidek avatar Jan 16 '23 15:01 Augustin-Zidek