RFdiffusion icon indicating copy to clipboard operation
RFdiffusion copied to clipboard

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

Open Danny234-stack opened this issue 5 months ago • 2 comments

Hi community, While running the macrocyclic example, I got this issue:

Error executing job with overrides: ['inference.output_prefix=my_files/out_macrocycle/macrocyclic_test', 'contigmap.contigs=[10-18]', 'inference.cyclic=True', "inference.cyc_chains='a'", 'inference.num_designs=5', 'diffuser.T=50'] Traceback (most recent call last): File "D:\Peptide Design\RFdiffusion\scripts\run_inference.py", line 94, in main px0, x_t, seq_t, plddt = sampler.sample_step( File "d:\peptide design\rfdiffusion\rfdiffusion\inference\model_runners.py", line 686, in sample_step msa_prev, pair_prev, px0, state_prev, alpha, logits, plddt = self.model(msa_masked, File "D:\miniconda3\envs\SE3nv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "d:\peptide design\rfdiffusion\rfdiffusion\RoseTTAFoldModel.py", line 77, in forward msa_latent, pair, state = self.latent_emb(msa_latent, seq, idx, cyclic_reses) File "D:\miniconda3\envs\SE3nv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "d:\peptide design\rfdiffusion\rfdiffusion\Embeddings.py", line 96, in forward msa = self.emb(msa) # (B, N, L, d_model) # MSA embedding File "D:\miniconda3\envs\SE3nv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "D:\miniconda3\envs\SE3nv\lib\site-packages\torch\nn\modules\linear.py", line 96, in forward return F.linear(input, self.weight, self.bias) File "D:\miniconda3\envs\SE3nv\lib\site-packages\torch\nn\functional.py", line 1847, in linear return torch._C._nn.linear(input, weight, bias) RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Nobody got this so far. Please help me solve this problem

Danny234-stack avatar Jul 17 '25 15:07 Danny234-stack

Based on your error message this appears to be a CUDA issue. Does this only happen with the design_macrocylic_binder.sh and design_macrocyclic_monomer.sh examples or with every attempt at running RFdiffusion's inference script?

rclune avatar Jul 17 '25 16:07 rclune

Thank you for getting back to me. I'm not sure why, but I attempted to run the unconditional monomer, and nothing abnormal happened. This problem happened when I try to do macrocycle example again

Danny234-stack avatar Jul 17 '25 17:07 Danny234-stack