RFdiffusion icon indicating copy to clipboard operation
RFdiffusion copied to clipboard

CUDA error: no kernel image is available for execution on the device (NVIDIA RTX 5090 + CUDA 11.6.2 + torch==1.12.1+cu116)

Open 20254018 opened this issue 5 months ago • 4 comments

Description When trying to run the RFdiffusion inference script with a NVIDIA GeForce RTX 5090 GPU (sm_120), I encounter the following CUDA error:

RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

At the beginning of the log, I also see this warning: UserWarning: NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86. If you want to use the NVIDIA GeForce RTX 5090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

How to Reproduce I use the following command:

/home/hj/myenv/bin/python /mnt/d/pycharm/RFdiffusion/RFdiffusion-main/RFdiffusion-main/scripts/run_inference.py
inference.output_prefix=example_outputs/design_ppi
inference.input_pdb=input_pdbs/insulin_target.pdb
contigmap.contigs=[A1-150/0 70-100]
ppi.hotspot_res=[A59,A83,A91]
inference.num_designs=10
denoiser.noise_scale_ca=0
denoiser.noise_scale_frame=0

Environment GPU: NVIDIA GeForce RTX 5090 (CUDA capability sm_120)

CUDA: 11.6.2

Python: 3.9

Operating System: Ubuntu 20.04

Container image: nvcr.io/nvidia/cuda:11.6.2-cudnn8-runtime-ubuntu20.04

PyTorch: 1.12.1+cu116

DGL: 1.0.2+cu116

Other key dependencies:

e3nn==0.3.3 wandb==0.12.0 pynvml==11.0.0 git+https://github.com/NVIDIA/dllogger#egg=dllogger decorator==5.1.0 hydra-core==1.3.2 pyrsistent==0.19.3 pytest

RFdiffusion: latest main branch

What I've Tried Confirmed CUDA driver and GPU work properly

Using PyTorch 1.12.1+cu116 (which only supports up to sm_86 officially)

Checked the PyTorch official docs, but found no support for RTX 5090 (sm_120) yet

Tried setting CUDA_LAUNCH_BLOCKING=1 for a more precise stacktrace, but the core issue persists

What help do I need? How can I get PyTorch to support RTX 5090 (sm_120) and run RFdiffusion in this environment?

Do I have to wait for an official PyTorch release with sm_120 support, or is there a temporary workaround (like building from source)? If so, can you provide detailed suggestions?

If changing CUDA or PyTorch versions could help, which ones do you recommend and how should I install them?

Any other suggestions or possible workarounds are appreciated!

Thank you for your help!

20254018 avatar Aug 01 '25 08:08 20254018

Hi @liukuozy and RFdiffusion maintainers,

Thanks for your guidance in RosettaCommons/RFdiffusion#380. I followed the steps you shared, but to make the setup reproducible I need precise details. Could you please share:

Exact environment & versions (preferably a or pip freezeconda env export)

OS + kernel, NVIDIA driver, CUDA toolkit, GPU model (Blackwell variant)

Conda version; Python 3.12.x (exact patch)

PyTorch 2.7.1 / torchvision 0.22.1 / torchaudio 2.7.1 (cu128 build tag)

DGL resolved version from dglteam/label/th24_cu124

hydra-core, pyrsistent, pandas, packaging, pydantic, pyyaml

Any sensitive deps in RFdiffusion/SE3Transformer (e.g., numpy, scipy, einops, biopython)

I see a “DGL and PyTorch version incompatibility” warning. Is it safe to ignore long-term, or do you recommend a DGL build aligned with cu128?

Specific code changes

Besides removing version pins in , are there any source edits in RFdiffusion or SE3Transformer?env/SE3Transformer/requirements.txt

If yes, please provide file paths, line ranges/snippets, and ideally a minimal diff/patch.

If fully unpinned deps can break, which packages should be pinned to exact versions?

Known-good run recipe

A minimal command (and config overrides) you’ve verified on Blackwell.

Any required env vars or settings (e.g., , , , ).CUDA_VISIBLE_DEVICESPYTORCH_CUDA_ALLOC_CONFOMP_NUM_THREADStorch.set_float32_matmul_precision

Optional artifacts

A ready-to-use or , and a tiny script/model to validate the install.environment.ymlrequirements.txt

I’m happy to contribute a PR/docs once this is reproducible. If helpful, I can share my logs and current .pip freeze

---- Replied Message ---- | From | @.> | | Date | 8/15/2025 14:05 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [RosettaCommons/RFdiffusion] CUDA error: no kernel image is available for execution on the device (NVIDIA RTX 5090 + CUDA 11.6.2 + torch==1.12.1+cu116) (Issue #380) | liukuozy left a comment (RosettaCommons/RFdiffusion#380)

for blackwell,try this: git clone https://github.com/RosettaCommons/RFdiffusion.git mv RFdiffusion/ rfdiffusion cd rfdiffusion conda create -n rfdiffusion python=3.12 conda activate rfdiffusion conda install -c dglteam/label/th24_cu124 dgl pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128 pip install hydra-core pyrsistent pandas packaging pydantic pyyaml cd env/SE3Transformer nano requirements.txt #Remove all version specifiers from requirements.txt to let pip choose the versions. pip install --no-cache-dir -r requirements.txt python setup.py install cd ../.. pip install -e .

When installing pytorch, you might get a error about DGL and PyTorch version incompatibility;just ignore it. When running rfdiffusion, it's possible that some pip packages were not installed yet. Use 'pip install <package_name>' to install them if needed.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

20254018 avatar Aug 22 '25 07:08 20254018

Have you solved the issue? Same problem...

jufercar avatar Oct 21 '25 15:10 jufercar

Hi, you can have a look at the Dockerfile that I made for my RTX5090.

https://github.com/JMB-Scripts/RFdiffusion-dockerfile-nvidia-RTX5090

Might help you Cheers

JMB-Scripts avatar Oct 23 '25 11:10 JMB-Scripts

@JMB-Scripts Thank you so much! Your solution worked for me. Spent several days on a conda and docker solution with no luck. Had to add "--no-check-certificate" to some of the wget commands. Thanks again!

dnanto avatar Oct 27 '25 15:10 dnanto