Cannot install RFDiffusion on VM with X86-64, pytorch_cuda-12.8.1, RTX 5090 GPU
Hello everyone, I use vm to install RFDiffusion with spec: X86-64, RTX 5090 GPU nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2025 NVIDIA Corporation Built on Fri_Feb_21_20:23:50_PST_2025 Cuda compilation tools, release 12.8, V12.8.93 Build cuda_12.8.r12.8/compiler.35583870_0
It gave errors when running Conda Install SE3-Transformer code: conda env create -f env/SE3nv.yml conda activate SE3nv cd env/SE3Transformer pip install --no-cache-dir -r requirements.txt python setup.py install cd ../.. # change into the root directory of the repository pip install -e . # install the rfdiffusion module from the root of the repository
The output errors are as follows. I'd like to learn any suggestions or advice! Thank you very much! /workspace/RFdiffusion/rfdiffusion/inference/model_runners.py", line 722, in sample_step msa_prev, pair_prev, px0, state_prev, alpha, logits, plddt = self.model(msa_masked, File "/venv/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/workspace/RFdiffusion/rfdiffusion/RoseTTAFoldModel.py", line 103, in forward msa, pair, R, T, alpha_s, state = self.simulator(seq, msa_latent, msa_full, pair, xyz[:,:,:3], File "/venv/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/workspace/RFdiffusion/rfdiffusion/Track_module.py", line 420, in forward msa_full, pair, R_in, T_in, state, alpha = self.extra_block[i_m](msa_full, File "/venv/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/workspace/RFdiffusion/rfdiffusion/Track_module.py", line 332, in forward R, T, state, alpha = self.str2str(msa, pair, R_in, T_in, xyz, state, idx, motif_mask=motif_mask, cyclic_reses=cyclic_reses, top_k=0) File "/venv/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/venv/SE3nv/lib/python3.9/site-packages/torch/cuda/amp/autocast_mode.py", line 141, in decorate_autocast return func(args, **kwargs) File "/workspace/RFdiffusion/rfdiffusion/Track_module.py", line 266, in forward shift = self.se3(G, node.reshape(BL, -1, 1), l1_feats, edge_feats) File "/venv/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/workspace/RFdiffusion/rfdiffusion/SE3_network.py", line 83, in forward return self.se3(G, node_features, edge_features) File "/venv/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/venv/SE3nv/lib/python3.9/site-packages/se3_transformer/model/transformer.py", line 140, in forward basis = basis or get_basis(graph.edata['rel_pos'], max_degree=self.max_degree, compute_gradients=False, File "/venv/SE3nv/lib/python3.9/site-packages/se3_transformer/model/basis.py", line 166, in get_basis with nvtx_range('spherical harmonics'): File "/venv/SE3nv/lib/python3.9/contextlib.py", line 119, in enter return next(self.gen) File "/venv/SE3nv/lib/python3.9/site-packages/torch/cuda/nvtx.py", line 59, in range range_push(msg.format(*args, **kwargs)) File "/venv/SE3nv/lib/python3.9/site-packages/torch/cuda/nvtx.py", line 28, in range_push 2755: /wor" 20:40 17-Oct-25 return _nvtx.rangePushA(msg) File "/venv/SE3nv/lib/python3.9/site-packages/torch/cuda/nvtx.py", line 9, in _fail raise RuntimeError("NVTX functions not installed. Are you sure you have a CUDA build?") RuntimeError: NVTX functions not installed. Are you sure you have a CUDA build?
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
On some machines, the installation installs the CPU version of pytorch, rather than the GPU one.
You can check this by doing a conda list | grep pytorch -- if you see a cpu in the version specification, then that's the issue.
I think you should be able to fix this by reinstalling pytorch, and specifying the pytorch channel: conda install -c pytorch pytorch=1.9
(All conda commands should be run after doing the conda activate SE3nv)
Thanks a lot for the kind suggestion. Yes, seems pytorch 1.9.1 with cpu version.
Should I remove cpu-pytorch before running the command line to install gpu one? As if I didn't remove cpu one, it still gave the same error
I'd probably recommend just trying the reinstall first, and see if it fixes things. If not, then you can try removing before re-installing. (The reason is that removing pytorch might have a knock-on effect with other packages, one which might not be reversed simply by re-installing pytorch.)
I had a similar problem and struggled a lot to fix it (mostly because I'm a slow learner), but you can try my Dockerfile here :
https://github.com/JMB-Scripts/RFdiffusion-dockerfile-nvidia-RTX5090
Good luck