'Nan' or Not-a-number issue with running ColabFold
Just this afternoon (Mon Feb 27), my ColabFold AF2.3.1 multimer runs suddenly started exhibiting odd "Nan" (which = Not-a-number) values for the pLDDT, pTM and ipTM metrics, and accordingly, the program crashes (see error msgs below) at the end of the Model1 run. I've repeated this with a number of different sequences, rebooted the Colab with no changes, etc. Thx in advance for your expert help and advice.
023-02-27 21:11:34,081 Setting max_seq=508, max_extra_seq=2048 2023-02-27 21:12:16,142 alphafold2_multimer_v3_model_1_seed_000 recycle=0 pLDDT=nan pTM=nan ipTM=nan 2023-02-27 21:12:22,278 alphafold2_multimer_v3_model_1_seed_000 recycle=1 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:12:28,461 alphafold2_multimer_v3_model_1_seed_000 recycle=2 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:12:34,688 alphafold2_multimer_v3_model_1_seed_000 recycle=3 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:12:40,963 alphafold2_multimer_v3_model_1_seed_000 recycle=4 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:12:47,300 alphafold2_multimer_v3_model_1_seed_000 recycle=5 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:12:53,677 alphafold2_multimer_v3_model_1_seed_000 recycle=6 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:13:00,076 alphafold2_multimer_v3_model_1_seed_000 recycle=7 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:13:06,486 alphafold2_multimer_v3_model_1_seed_000 recycle=8 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:13:12,943 alphafold2_multimer_v3_model_1_seed_000 recycle=9 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:13:19,443 alphafold2_multimer_v3_model_1_seed_000 recycle=10 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:13:25,990 alphafold2_multimer_v3_model_1_seed_000 recycle=11 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:13:32,512 alphafold2_multimer_v3_model_1_seed_000 recycle=12 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:13:32,514 alphafold2_multimer_v3_model_1_seed_000 took 113.3s (12 recycles)
LinAlgError Traceback (most recent call last)
8 frames /usr/local/lib/python3.8/dist-packages/numpy/linalg/linalg.py in _raise_linalgerror_svd_nonconvergence(err, flag) 95 96 def _raise_linalgerror_svd_nonconvergence(err, flag): ---> 97 raise LinAlgError("SVD did not converge") 98 99 def _raise_linalgerror_lstsq(err, flag):
LinAlgError: SVD did not converge
Quick update: This error does not appear to happen when ColabFold1.5.2 is run in monomer mode (AF2 mode set to 'auto' = ptm for monomer), but only happens on multimer setting (any of the different flavors, v1, v2, or v3). Here's the error msg return for multimer-v2:
2023-02-27 21:31:34,413 Setting max_seq=252, max_extra_seq=1152 2023-02-27 21:32:04,390 alphafold2_multimer_v2_model_1_seed_000 recycle=0 pLDDT=nan pTM=nan ipTM=nan 2023-02-27 21:32:07,875 alphafold2_multimer_v2_model_1_seed_000 recycle=1 pLDDT=nan pTM=nan ipTM=nan tol=nan 2023-02-27 21:32:07,876 alphafold2_multimer_v2_model_1_seed_000 took 29.1s (1 recycles)
LinAlgError Traceback (most recent call last)
8 frames /usr/local/lib/python3.8/dist-packages/numpy/linalg/linalg.py in _raise_linalgerror_svd_nonconvergence(err, flag) 95 96 def _raise_linalgerror_svd_nonconvergence(err, flag): ---> 97 raise LinAlgError("SVD did not converge") 98 99 def _raise_linalgerror_lstsq(err, flag):
LinAlgError: SVD did not converge
Oddly enough, I also get a 'Nan' error when running DeepMind's AF Colab that is running AF2.3.1 in multimer mode. This time it crashed as it was running the AMBER relax (which I'd toggled on), and here's the error message below. Thx again for your expert help, FB
AMBER relaxation: 83% 5/6 [elapsed: 38:52 remaining: 07:38]
OpenMMException Traceback (most recent call last)
5 frames /opt/conda/lib/python3.8/site-packages/simtk/openmm/openmm.py in minimize(context, tolerance, maxIterations) 4108 the maximum number of iterations to perform. If this is 0, minimation is continued until the results converge without regard to how many iterations it takes. The default value is 0. 4109 """ -> 4110 return _openmm.LocalEnergyMinimizer_minimize(context, tolerance, maxIterations) 4111 swig_destroy = _openmm.delete_LocalEnergyMinimizer 4112
OpenMMException: Particle coordinate is nan
The issue is that google colab upgraded to jax 0.4.4. I've now updated the notebook to downgrade to old version of jax recommended by deepmind in local installations.
Thx for the heads-up on the jax version clash! Running the ColabFold again in a quick test (after reboot of the Colab), it looks like another jax issue pops up in the very early stages of running, right after AF2 weights are downloaded:
Downloading alphafold2 weights to .: 100%|██████████| 3.82G/3.82G [03:00<00:00, 22.7MB/s]
KeyError Traceback (most recent call last) /content/colabfold/batch.py in run(queries, result_dir, num_models, is_complex, num_recycles, recycle_early_stop_tolerance, model_order, num_ensemble, model_type, msa_mode, use_templates, custom_template_path, num_relax, keep_existing_results, rank_by, pair_mode, data_dir, host_url, random_seed, num_seeds, recompile_padding, zip_results, prediction_callback, save_single_representations, save_pair_representations, save_all, save_recycles, use_dropout, use_gpu_relax, stop_at_score, dpi, max_seq, max_extra_seq, use_cluster_profile, feature_dict_callback, **kwargs) 1203 import jax.tools.colab_tpu -> 1204 jax.tools.colab_tpu.setup_tpu() 1205 logger.info('Running on TPU')
29 frames KeyError: 'COLAB_TPU_ADDR'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.8/site-packages/OpenSSL/crypto.py in
AttributeError: module 'lib' has no attribute 'OpenSSL_add_all_algorithms'
Hi all, I don't know if there is any update on this issue, but AF predictions continue to throw this very same error every time I try to run a prediction. Thanks!
Can you try again, but with latest version of the notebook?
I just tried with the notebook that was latest modified 7 hours ago (Latest commit 26ac916 7 hours ago, Next try to pin tensorflow-cpu to 2.11.0) and the problem is still there
On 2023-02-28 13:56, Sergey O wrote:
Can you try again, but with latest version of the notebook?
-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you commented.Message ID: @.***>
Links:
[1] https://github.com/sokrypton/ColabFold/issues/399#issuecomment-1448132781 [2] https://github.com/notifications/unsubscribe-auth/AWR5AKJR45YGBZ6ET474XRDWZXYYFANCNFSM6AAAAAAVJ5HHPU
Tried again this morning (Tue 28th), and sadly get a similar jax issue as last night's AF2.3.1. multimer run, immediately after it downloads the AF2 weights. Here's the error msg:
Downloading alphafold2 weights to .: 100%|██████████| 3.82G/3.82G [02:33<00:00, 26.7MB/s]
KeyError Traceback (most recent call last) /content/colabfold/batch.py in run(queries, result_dir, num_models, is_complex, num_recycles, recycle_early_stop_tolerance, model_order, num_ensemble, model_type, msa_mode, use_templates, custom_template_path, num_relax, keep_existing_results, rank_by, pair_mode, data_dir, host_url, random_seed, num_seeds, recompile_padding, zip_results, prediction_callback, save_single_representations, save_pair_representations, save_all, save_recycles, use_dropout, use_gpu_relax, stop_at_score, dpi, max_seq, max_extra_seq, use_cluster_profile, feature_dict_callback, **kwargs) 1203 import jax.tools.colab_tpu -> 1204 jax.tools.colab_tpu.setup_tpu() 1205 logger.info('Running on TPU')
29 frames KeyError: 'COLAB_TPU_ADDR'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.8/site-packages/OpenSSL/crypto.py in
AttributeError: module 'lib' has no attribute 'OpenSSL_add_all_algorithms'
I just deployed a fix that should hopefully fix these issues. Please try again.
Just tried again, & without going into AF2 weights download stage, rapidly got the same error msg:
KeyError Traceback (most recent call last) /content/colabfold/batch.py in run(queries, result_dir, num_models, is_complex, num_recycles, recycle_early_stop_tolerance, model_order, num_ensemble, model_type, msa_mode, use_templates, custom_template_path, num_relax, keep_existing_results, rank_by, pair_mode, data_dir, host_url, random_seed, num_seeds, recompile_padding, zip_results, prediction_callback, save_single_representations, save_pair_representations, save_all, save_recycles, use_dropout, use_gpu_relax, stop_at_score, dpi, max_seq, max_extra_seq, use_cluster_profile, feature_dict_callback, **kwargs) 1203 import jax.tools.colab_tpu -> 1204 jax.tools.colab_tpu.setup_tpu() 1205 logger.info('Running on TPU')
29 frames KeyError: 'COLAB_TPU_ADDR'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
/usr/local/lib/python3.8/site-packages/OpenSSL/crypto.py in
AttributeError: module 'lib' has no attribute 'OpenSSL_add_all_algorithms'
Did you refresh the notebook and session? Please make sure no runtime was already running and that you completely reloaded the notebook.
Latest multimer run was positive, fixes seem to be holding! Many thx, Milot & Sergey
running smooth so far, thanks!
On 2023-02-28 16:14, jfbazan wrote:
Latest multimer run was positive, fixes seem to be holding! Many thx, Milot & Sergey
-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you commented.Message ID: @.***>
Links:
[1] https://github.com/sokrypton/ColabFold/issues/399#issuecomment-1448361999 [2] https://github.com/notifications/unsubscribe-auth/AWR5AKPTY7NRVLFCOTPHVFLWZYI5VANCNFSM6AAAAAAVJ5HHPU
I have been trying to run Colab v1.5.2 in multimer mode, but run into a similar situation as previously seen in this thread where it works for monomers, but not multimers.
Fasta file:
PI:PI PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK:PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASK
Is this a jax issue again, and if so, which version should it be downgraded to?
Thanks a lot for your help. Niels
Output: [x@g-11-g0002 ~]$ module load tools gcc [x@g-11-g0002 ~]$ module load cuda/toolkit/11.8.0 [x@g-11-g0002 ~]$ module whatis cuda/toolkit/11.8.0 -------------------------------------------------------------------------------------------- /cm/local/.modulefiles_cache/tools/modulefiles -------------------------------------------------------------------------------------------- cuda/toolkit/11.8.0: NVIDIA CUDA Toolkit 11.8.0 - Develop, Optimize and Deploy GPU-accelerated Apps [x@g-11-g0002 ~]$ module load cudnn/11.8-8.6.0.163 [x@g-11-g0002 ~]$ module whatis cudnn/11.8-8.6.0.163 -------------------------------------------------------------------------------------------- /cm/local/.modulefiles_cache/tools/modulefiles -------------------------------------------------------------------------------------------- cudnn/11.8-8.6.0.163: NVIDIA cuDNN 11.8-8.6.0.163 for CUDA 11.8 - GPU-accelerated library of primitives for deep neural networks [x@g-11-g0002 ~]$ module load openmm/8.0.0 labfold/1.5.2[x@g-11-g0002 ~]$ module whatis openmm/8.0.0 -------------------------------------------------------------------------------------------- /cm/local/.modulefiles_cache/tools/modulefiles -------------------------------------------------------------------------------------------- openmm/8.0.0: OpenMM 8.0.0 - High performance, customizable molecular simulation [x@g-11-g0002 ~]$ module load colabfold/1.5.2 [x@g-11-g0002 ~]$ module whatis colabfold/1.5.2 -------------------------------------------------------------------------------------------- /cm/local/.modulefiles_cache/tools/modulefiles -------------------------------------------------------------------------------------------- colabfold/1.5.2: ColabFold 1.5.2 - Making Protein folding accessible to all! [x@g-11-g0002 ~]$ python -V Python 3.9.16 [x@g-11-g0002 ~]$ conda list /services/tools/openmm/8.0.0/lib/python3.11/site-packages/conda_package_streaming/package_streaming.py:19: UserWarning: zstandard could not be imported. Running without .conda support. warnings.warn("zstandard could not be imported. Running without .conda support.") /services/tools/openmm/8.0.0/lib/python3.11/site-packages/conda_package_handling/api.py:29: UserWarning: Install zstandard Python bindings for .conda support _warnings.warn("Install zstandard Python bindings for .conda support")
packages in environment at /services/tools/colabfold/1.5.2:
Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge absl-py 1.4.0 pypi_0 pypi alphafold-colabfold 2.3.4 pypi_0 pypi appdirs 1.4.4 pypi_0 pypi astunparse 1.6.3 pypi_0 pypi biopython 1.81 pypi_0 pypi bzip2 1.0.8 h7f98852_4 conda-forge ca-certificates 2023.01.10 h06a4308_0 cachetools 5.3.0 pypi_0 pypi certifi 2022.12.7 py39h06a4308_0 charset-normalizer 3.1.0 pypi_0 pypi chex 0.1.6 pypi_0 pypi colabfold 1.5.2 pypi_0 pypi contextlib2 21.6.0 pypi_0 pypi contourpy 1.0.7 pypi_0 pypi cudatoolkit 10.2.89 h713d32c_10 conda-forge cycler 0.11.0 pypi_0 pypi dm-haiku 0.0.9 pypi_0 pypi dm-tree 0.1.8 pypi_0 pypi docker 6.0.1 pypi_0 pypi fftw 3.3.10 nompi_h77c792f_102 conda-forge flatbuffers 23.3.3 pypi_0 pypi fonttools 4.39.3 pypi_0 pypi gast 0.4.0 pypi_0 pypi google-auth 2.17.3 pypi_0 pypi google-auth-oauthlib 1.0.0 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.53.0 pypi_0 pypi h5py 3.8.0 pypi_0 pypi hhsuite 3.3.0 py39pl5321h67e14b5_5 bioconda idna 3.4 pypi_0 pypi immutabledict 2.2.4 pypi_0 pypi importlib-metadata 4.13.0 pypi_0 pypi importlib-resources 5.12.0 pypi_0 pypi jax 0.4.8 pypi_0 pypi jaxlib 0.4.7+cuda11.cudnn86 pypi_0 pypi jmp 0.0.4 pypi_0 pypi kalign2 2.04 hec16e2b_3 bioconda keras 2.12.0 pypi_0 pypi kiwisolver 1.4.4 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1 libblas 3.9.0 16_linux64_openblas conda-forge libcblas 3.9.0 16_linux64_openblas conda-forge libclang 16.0.0 pypi_0 pypi libffi 3.4.2 h6a678d5_6 libgcc-ng 12.2.0 h65d4601_19 conda-forge libgfortran-ng 12.2.0 h69a702a_19 conda-forge libgfortran5 12.2.0 h337968e_19 conda-forge libgomp 12.2.0 h65d4601_19 conda-forge liblapack 3.9.0 16_linux64_openblas conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge libsqlite 3.40.0 h753d276_0 conda-forge libstdcxx-ng 11.2.0 h1234567_1 libuuid 2.38.1 h0b41bf4_0 conda-forge libzlib 1.2.13 h166bdaf_4 conda-forge markdown 3.4.3 pypi_0 pypi markupsafe 2.1.2 pypi_0 pypi matplotlib 3.7.1 pypi_0 pypi ml-collections 0.1.1 pypi_0 pypi ml-dtypes 0.1.0 pypi_0 pypi ncurses 6.4 h6a678d5_0 numpy 1.23.5 pypi_0 pypi oauthlib 3.2.2 pypi_0 pypi ocl-icd 2.3.1 h7f98852_0 conda-forge ocl-icd-system 1.0.0 1 conda-forge openmm 7.7.0 py39h9717219_1 conda-forge openssl 3.1.0 h0b41bf4_0 conda-forge opt-einsum 3.3.0 pypi_0 pypi packaging 23.1 pypi_0 pypi pandas 1.5.3 pypi_0 pypi pdbfixer 1.8.1 pyh6c4a22f_0 conda-forge perl 5.32.1 2_h7f98852_perl5 conda-forge pillow 9.5.0 pypi_0 pypi pip 23.0.1 py39h06a4308_0 protobuf 4.22.3 pypi_0 pypi py3dmol 2.0.1.post1 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pyparsing 3.0.9 pypi_0 pypi python 3.9.16 h2782a2a_0_cpython conda-forge python-dateutil 2.8.2 pypi_0 pypi python_abi 3.9 3_cp39 conda-forge pytz 2023.3 pypi_0 pypi pyyaml 6.0 pypi_0 pypi readline 8.2 h5eee18b_0 requests 2.28.2 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rsa 4.9 pypi_0 pypi scipy 1.10.1 pypi_0 pypi setuptools 65.6.3 py39h06a4308_0 six 1.16.0 pypi_0 pypi sqlite 3.41.2 h5eee18b_0 tabulate 0.9.0 pypi_0 pypi tensorboard 2.12.2 pypi_0 pypi tensorboard-data-server 0.7.0 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorflow-cpu 2.12.0 pypi_0 pypi tensorflow-estimator 2.12.0 pypi_0 pypi tensorflow-io-gcs-filesystem 0.32.0 pypi_0 pypi termcolor 2.2.0 pypi_0 pypi tk 8.6.12 h1ccaba5_0 toolz 0.12.0 pypi_0 pypi tqdm 4.65.0 pypi_0 pypi typing-extensions 4.5.0 pypi_0 pypi tzdata 2023c h04d1e81_0 urllib3 1.26.15 pypi_0 pypi websocket-client 1.5.1 pypi_0 pypi werkzeug 2.2.3 pypi_0 pypi wheel 0.38.4 py39h06a4308_0 wrapt 1.14.1 pypi_0 pypi xz 5.2.10 h5eee18b_1 zipp 3.15.0 pypi_0 pypi zlib 1.2.13 h166bdaf_4 conda-forge
[x@g-11-g0002 logs]$colabfold_batch $fasta_file $outputdir --model-type alphafold2_multimer_v2 --data $weights --num-recycle 1 --rank iptm --overwrite-existing-results 2025-01-06 17:20:31,530 Running colabfold 1.5.2 (3e99c44eec189ec27f6d120af851adb7ff6aa2a2)
WARNING: You are welcome to use the default MSA server, however keep in mind that it's a
limited shared resource only capable of processing a few thousand MSAs per day. Please
submit jobs only from a single IP address. We reserve the right to limit access to the
server case-by-case when usage exceeds fair use. If you require more MSAs: You can
precompute all MSAs with colabfold_search or host your own API and pass it to --host-url
2025-01-06 17:20:31,754 Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA 2025-01-06 17:20:31,755 Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client' 2025-01-06 17:20:31,755 Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this. 2025-01-06 17:20:37,176 Running on GPU 2025-01-06 17:20:37,928 Found 4 citations for tools or databases 2025-01-06 17:20:37,929 Query 1/1: PI_PI (length 118) COMPLETE: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [elapsed: 00:02 remaining: 00:00] 2025-01-06 17:20:40,170 Setting max_seq=252, max_extra_seq=1152 2025-01-06 17:21:11,904 alphafold2_multimer_v2_model_1_seed_000 recycle=0 pLDDT=nan pTM=nan ipTM=nan 2025-01-06 17:21:14,388 alphafold2_multimer_v2_model_1_seed_000 recycle=1 pLDDT=nan pTM=nan ipTM=nan tol=nan 2025-01-06 17:21:14,388 alphafold2_multimer_v2_model_1_seed_000 took 26.6s (1 recycles) 2025-01-06 17:21:16,898 alphafold2_multimer_v2_model_2_seed_000 recycle=0 pLDDT=nan pTM=nan ipTM=nan 2025-01-06 17:21:19,380 alphafold2_multimer_v2_model_2_seed_000 recycle=1 pLDDT=nan pTM=nan ipTM=nan tol=nan 2025-01-06 17:21:19,381 alphafold2_multimer_v2_model_2_seed_000 took 5.0s (1 recycles) 2025-01-06 17:21:21,894 alphafold2_multimer_v2_model_3_seed_000 recycle=0 pLDDT=nan pTM=nan ipTM=nan 2025-01-06 17:21:24,380 alphafold2_multimer_v2_model_3_seed_000 recycle=1 pLDDT=nan pTM=nan ipTM=nan tol=nan 2025-01-06 17:21:24,380 alphafold2_multimer_v2_model_3_seed_000 took 5.0s (1 recycles) 2025-01-06 17:21:26,894 alphafold2_multimer_v2_model_4_seed_000 recycle=0 pLDDT=nan pTM=nan ipTM=nan 2025-01-06 17:21:29,378 alphafold2_multimer_v2_model_4_seed_000 recycle=1 pLDDT=nan pTM=nan ipTM=nan tol=nan