alphafold icon indicating copy to clipboard operation
alphafold copied to clipboard

ptxas fatal : Ptx assembly aborted due to errors

Open DrJesseHansen opened this issue 1 year ago • 3 comments

Hello all,

thank you for the update! However now we are running into a new error while testing on a bunch of different GPUs. This is on our HPC.

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=200GB

#SBATCH --time=48:00:00
#SBATCH --no-requeue

#SBATCH --partition=gpu
#SBATCH --gres=gpu:1

MY_PROTEIN_PATH=./test_multi.fasta


echo $HOSTNAME

module purge
module load alphafold/2.3.2

export OPENMM_CUDA_COMPILER=$(which nvcc)

python3 /mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.2/alphafold-2.3.2/run_alphafold.py \
	--model_preset=multimer \
	--fasta_paths=$MY_PROTEIN_PATH \
	--output_dir=$(dirname $MY_PROTEIN_PATH) \
	--data_dir=/nfs/scistore14/rcsb/alphafold.databases2/ \
	--mgnify_database_path=/nfs/scistore14/rcsb/alphafold.databases2/mgnify/mgy_clusters_2022_05.fa \
	--template_mmcif_dir=/nfs/scistore14/rcsb/alphafold.databases2/pdb_mmcif/mmcif_files/ \
	--max_template_date=2023-03-01 \
	--obsolete_pdbs_path=/nfs/scistore14/rcsb/alphafold.databases2/pdb_mmcif/obsolete.dat \
	--use_gpu_relax=true \
	--bfd_database_path=/nfs/scistore14/rcsb/alphafold.databases2/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
        --uniref30_database_path=/nfs/scistore14/rcsb/alphafold.databases2/uniref30/UniRef30_2021_03 \
	--uniref90_database_path=/nfs/scistore14/rcsb/alphafold.databases2/uniref90/uniref90.fasta \
	--pdb_seqres_database_path=/nfs/scistore14/rcsb/alphafold.databases2/pdb_seqres/pdb_seqres.txt \
	--uniprot_database_path=/nfs/scistore14/rcsb/alphafold.databases2/uniprot/uniprot.fasta
jhansen@bea81:~/AF2_testrun/April5_tests/multimer/anygpu$ 

The output is below:

I0405 23:07:57.857043 22984107501376 templates.py:256] Found an exact template match 3mi6_C.
I0405 23:07:57.878276 22984107501376 templates.py:256] Found an exact template match 3mi6_C.
I0405 23:07:57.897786 22984107501376 templates.py:256] Found an exact template match 3mi6_D.
I0405 23:07:57.919537 22984107501376 templates.py:256] Found an exact template match 3mi6_D.
I0405 23:07:58.011484 22984107501376 templates.py:256] Found an exact template match 3lj8_A.
I0405 23:07:58.083875 22984107501376 templates.py:256] Found an exact template match 2hxp_A.
I0405 23:07:58.212509 22984107501376 templates.py:256] Found an exact template match 4bvx_B.
I0405 23:07:58.220863 22984107501376 pipeline.py:234] Uniref90 MSA size: 36 sequences.
I0405 23:07:58.220938 22984107501376 pipeline.py:235] BFD MSA size: 36 sequences.
I0405 23:07:58.220975 22984107501376 pipeline.py:236] MGnify MSA size: 3 sequences.
I0405 23:07:58.221007 22984107501376 pipeline.py:237] Final (deduplicated) MSA size: 75 sequences.
I0405 23:07:58.221337 22984107501376 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20.
I0405 23:07:58.258716 22984107501376 run_alphafold.py:216] Running model model_1_multimer_v3_pred_0 on test_multi
I0405 23:07:58.259125 22984107501376 model.py:138] Running predict with shape(feat) = {'aatype': (480,), 'residue_index': (480,), 'seq_length': (), 'msa': (512, 480), 'num_alignments': (), 'template_aatype': (4, 480), 'template_all_atom_mask': (4, 480, 37), 'template_all_atom_positions': (4, 480, 37, 3), 'asym_id': (480,), 'sym_id': (480,), 'entity_id': (480,), 'deletion_matrix': (512, 480), 'deletion_mean': (480,), 'all_atom_mask': (480, 37), 'all_atom_positions': (480, 37, 3), 'assembly_num_chains': (), 'entity_mask': (480,), 'num_templates': (), 'cluster_bias_mask': (512,), 'bert_mask': (512, 480), 'seq_mask': (480,), 'msa_mask': (512, 480)}
2023-04-05 23:08:16.639972: W external/xla/xla/service/gpu/ir_emitter_triton.cc:761] Shared memory size limit exceeded.
2023-04-05 23:08:16.718537: E external/xla/xla/service/gpu/triton_autotuner.cc:271] Failure: INTERNAL: ptxas exited with non-zero error code 65280, output: ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 243; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 247; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 251; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 255; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 259; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 263; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 267; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 271; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 275; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 279; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 283; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 287; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 291; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 295; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 299; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-11ad00b2-2079620-5f89d30d97b88, line 303; error   : Rounding modifier required for instruction 'cvt'
ptxas fatal   : Ptx assembly aborted due to errors

Traceback (most recent call last):
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/alphafold-2.3.4/run_alphafold.py", line 468, in <module>
    app.run(main)
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/alphafold-2.3.4/run_alphafold.py", line 443, in main
    predict_structure(
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/alphafold-2.3.4/run_alphafold.py", line 224, in predict_structure
    prediction_result = model_runner.predict(processed_feature_dict,
  File "/nfs/scistore07/clustersw/debian/bullseye/cuda11.2/alphafold/2.3.4/alphafold-2.3.4/alphafold/model/model.py", line 185, in predict
    result, prev = run(sub_key, sub_feat, prev)
  File "/nfs/scistore07/clustersw/debian/bullseye/cuda11.2/alphafold/2.3.4/alphafold-2.3.4/alphafold/model/model.py", line 165, in run
    result = _jnp_to_np(self.apply(self.params, key, {**feat, "prev":prev}))
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/jax/_src/traceback_util.py", line 166, in reraise_with_filtered_traceback
    return fun(*args, **kwargs)
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/jax/_src/pjit.py", line 238, in cache_miss
    outs, out_flat, out_tree, args_flat = _python_pjit_helper(
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/jax/_src/pjit.py", line 185, in _python_pjit_helper
    out_flat = pjit_p.bind(*args_flat, **params)
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/jax/_src/core.py", line 2592, in bind
    return self.bind_with_trace(top_trace, args, params)
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/jax/_src/core.py", line 363, in bind_with_trace
    out = trace.process_primitive(self, map(trace.full_raise, args), params)
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/jax/_src/core.py", line 817, in process_primitive
    return primitive.impl(*tracers, **params)
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/jax/_src/pjit.py", line 1229, in _pjit_call_impl
    compiled = _pjit_lower(
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/jax/_src/interpreters/pxla.py", line 2816, in compile
    self._executable = UnloadedMeshExecutable.from_hlo(
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/jax/_src/interpreters/pxla.py", line 3028, in from_hlo
    xla_executable = dispatch.compile_or_get_cached(
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/jax/_src/dispatch.py", line 526, in compile_or_get_cached
    return backend_compile(backend, serialized_computation, compile_options,
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/jax/_src/profiler.py", line 314, in wrapper
    return func(*args, **kwargs)
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/jax/_src/dispatch.py", line 471, in backend_compile
    return backend.compile(built_c, compile_options=options)
jax._src.traceback_util.UnfilteredStackTrace: jaxlib.xla_extension.XlaRuntimeError: INTERNAL: ptxas exited with non-zero error code 65280, output: ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 243; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 247; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 251; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 255; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 259; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 263; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 267; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 271; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 275; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 279; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 283; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 287; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 291; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 295; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 299; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 303; error   : Rounding modifier required for instruction 'cvt'
ptxas fatal   : Ptx assembly aborted due to errors

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/alphafold-2.3.4/run_alphafold.py", line 468, in <module>
    app.run(main)
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/alphafold-2.3.4/run_alphafold.py", line 443, in main
    predict_structure(
  File "/mnt/nfs/clustersw/Debian/bullseye/cuda/11.2/alphafold/2.3.4/alphafold-2.3.4/run_alphafold.py", line 224, in predict_structure
    prediction_result = model_runner.predict(processed_feature_dict,
  File "/nfs/scistore07/clustersw/debian/bullseye/cuda11.2/alphafold/2.3.4/alphafold-2.3.4/alphafold/model/model.py", line 185, in predict
    result, prev = run(sub_key, sub_feat, prev)
  File "/nfs/scistore07/clustersw/debian/bullseye/cuda11.2/alphafold/2.3.4/alphafold-2.3.4/alphafold/model/model.py", line 165, in run
    result = _jnp_to_np(self.apply(self.params, key, {**feat, "prev":prev}))
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: ptxas exited with non-zero error code 65280, output: ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 243; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 247; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 251; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 255; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 259; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 263; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 267; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 271; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 275; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 279; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 283; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 287; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 291; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 295; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 299; error   : Rounding modifier required for instruction 'cvt'
ptxas /tmp/tempfile-gpu227-19a8fd1-2079620-5f89d30dc097c, line 303; error   : Rounding modifier required for instruction 'cvt'
ptxas fatal   : Ptx assembly aborted due to errors

DrJesseHansen avatar Apr 06 '23 05:04 DrJesseHansen

same error

Ziyang-Yu avatar Apr 22 '23 12:04 Ziyang-Yu

You can upgrade your cuda version to 11.8 and above. The error is caused by ptx isa.

shenh10 avatar Oct 06 '23 04:10 shenh10

You can upgrade your cuda version to 11.8 and above. The error is caused by ptx isa.

Still same error. Even on CUDA 12.2

Shwetangshu avatar May 15 '24 08:05 Shwetangshu