RFdiffusion icon indicating copy to clipboard operation
RFdiffusion copied to clipboard

Cannot install RFDiffusion on VM with X86-64, pytorch_cuda-12.8.1, RTX 5090 GPU

Open biocatchen opened this issue 2 months ago • 4 comments

Hello everyone, I use vm to install RFDiffusion with spec: X86-64, RTX 5090 GPU nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2025 NVIDIA Corporation Built on Fri_Feb_21_20:23:50_PST_2025 Cuda compilation tools, release 12.8, V12.8.93 Build cuda_12.8.r12.8/compiler.35583870_0

It gave errors when running Conda Install SE3-Transformer code: conda env create -f env/SE3nv.yml conda activate SE3nv cd env/SE3Transformer pip install --no-cache-dir -r requirements.txt python setup.py install cd ../.. # change into the root directory of the repository pip install -e . # install the rfdiffusion module from the root of the repository

The output errors are as follows. I'd like to learn any suggestions or advice! Thank you very much! /workspace/RFdiffusion/rfdiffusion/inference/model_runners.py", line 722, in sample_step msa_prev, pair_prev, px0, state_prev, alpha, logits, plddt = self.model(msa_masked, File "/venv/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/workspace/RFdiffusion/rfdiffusion/RoseTTAFoldModel.py", line 103, in forward msa, pair, R, T, alpha_s, state = self.simulator(seq, msa_latent, msa_full, pair, xyz[:,:,:3], File "/venv/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/workspace/RFdiffusion/rfdiffusion/Track_module.py", line 420, in forward msa_full, pair, R_in, T_in, state, alpha = self.extra_block[i_m](msa_full, File "/venv/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/workspace/RFdiffusion/rfdiffusion/Track_module.py", line 332, in forward R, T, state, alpha = self.str2str(msa, pair, R_in, T_in, xyz, state, idx, motif_mask=motif_mask, cyclic_reses=cyclic_reses, top_k=0) File "/venv/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/venv/SE3nv/lib/python3.9/site-packages/torch/cuda/amp/autocast_mode.py", line 141, in decorate_autocast return func(args, **kwargs) File "/workspace/RFdiffusion/rfdiffusion/Track_module.py", line 266, in forward shift = self.se3(G, node.reshape(BL, -1, 1), l1_feats, edge_feats) File "/venv/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/workspace/RFdiffusion/rfdiffusion/SE3_network.py", line 83, in forward return self.se3(G, node_features, edge_features) File "/venv/SE3nv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/venv/SE3nv/lib/python3.9/site-packages/se3_transformer/model/transformer.py", line 140, in forward basis = basis or get_basis(graph.edata['rel_pos'], max_degree=self.max_degree, compute_gradients=False, File "/venv/SE3nv/lib/python3.9/site-packages/se3_transformer/model/basis.py", line 166, in get_basis with nvtx_range('spherical harmonics'): File "/venv/SE3nv/lib/python3.9/contextlib.py", line 119, in enter return next(self.gen) File "/venv/SE3nv/lib/python3.9/site-packages/torch/cuda/nvtx.py", line 59, in range range_push(msg.format(*args, **kwargs)) File "/venv/SE3nv/lib/python3.9/site-packages/torch/cuda/nvtx.py", line 28, in range_push 2755: /wor" 20:40 17-Oct-25 return _nvtx.rangePushA(msg) File "/venv/SE3nv/lib/python3.9/site-packages/torch/cuda/nvtx.py", line 9, in _fail raise RuntimeError("NVTX functions not installed. Are you sure you have a CUDA build?") RuntimeError: NVTX functions not installed. Are you sure you have a CUDA build?

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

biocatchen avatar Oct 17 '25 22:10 biocatchen

On some machines, the installation installs the CPU version of pytorch, rather than the GPU one.

You can check this by doing a conda list | grep pytorch -- if you see a cpu in the version specification, then that's the issue.

I think you should be able to fix this by reinstalling pytorch, and specifying the pytorch channel: conda install -c pytorch pytorch=1.9

(All conda commands should be run after doing the conda activate SE3nv)

roccomoretti avatar Oct 17 '25 22:10 roccomoretti

Thanks a lot for the kind suggestion. Yes, seems pytorch 1.9.1 with cpu version.

Image

Should I remove cpu-pytorch before running the command line to install gpu one? As if I didn't remove cpu one, it still gave the same error

biocatchen avatar Oct 17 '25 23:10 biocatchen

I'd probably recommend just trying the reinstall first, and see if it fixes things. If not, then you can try removing before re-installing. (The reason is that removing pytorch might have a knock-on effect with other packages, one which might not be reversed simply by re-installing pytorch.)

roccomoretti avatar Oct 20 '25 14:10 roccomoretti

I had a similar problem and struggled a lot to fix it (mostly because I'm a slow learner), but you can try my Dockerfile here :

https://github.com/JMB-Scripts/RFdiffusion-dockerfile-nvidia-RTX5090

Good luck

JMB-Scripts avatar Oct 23 '25 10:10 JMB-Scripts