RuntimeError: CUDA 'unknown error' Occurs on Subsequent Runs After First Successful Inference
The script (text-to-image) runs successfully on the first attempt, but on the second run, it fails with the following CUDA error:
File "/data/anaconda3/envs/llmtest/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 262, in forward key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs) File "/data/anaconda3/envs/llmtest/lib/python3.10/site-packages/transformers/cache_utils.py", line 447, in update self.value_cache[layer_idx] = torch.cat([self.value_cache[layer_idx], value_states], dim=-2) RuntimeError: CUDA error: unknown error CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.
transformers 4.51.3 torch 2.6.0 torchvision 0.21.0 pillow 11.2.1 nvidia-cublas-cu12 12.4.5.8 nvidia-cuda-cupti-cu12 12.4.127 nvidia-cuda-nvrtc-cu12 12.4.127 nvidia-cuda-runtime-cu12 12.4.127 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.2.1.3 nvidia-curand-cu12 10.3.5.147 nvidia-cusolver-cu12 11.6.1.9 nvidia-cusparse-cu12 12.3.1.170 nvidia-cusparselt-cu12 0.6.2 nvidia-nccl-cu12 2.21.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.4.127
Who can help me with this issue? Any guidance would be greatly appreciated. Thank you!