RuntimeError: CUDA error: device-side assert triggered when running 2048.ipynb

Open watemailunpi opened this issue 6 months ago • 1 comments

RuntimeError occurs when running 2048.ipynb at this link link: https://colab.research.google.com/github/openpipe/art/blob/main/examples/2048/2048.ipynb

loading model from .art/2048-multi-turn/models/agent-002/0010

==((====))==  Unsloth 2025.5.1: Fast Qwen2 patching. Transformers: 4.51.3. vLLM: 0.8.5.post1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[/tmp/ipython-input-3745976913.py](https://localhost:8080/#) in <cell line: 0>()
      9 print(f"loading model from {lora_model_path}\n")
     10 
---> 11 peft_model, tokenizer = FastLanguageModel.from_pretrained(
     12     model_name=lora_model_path,
     13     max_seq_length=16384,

9 frames
[/usr/local/lib/python3.11/dist-packages/unsloth/models/llama.py](https://localhost:8080/#) in _set_cos_sin_cache(self, seq_len, device, dtype)
   1266         # Different from paper, but it uses a different permutation in order to obtain the same calculation
   1267         emb = torch.cat((freqs, freqs), dim=-1)
-> 1268         self.register_buffer("cos_cached", emb.cos().to(dtype=dtype, device=device, non_blocking=True), persistent=False)
   1269         self.register_buffer("sin_cached", emb.sin().to(dtype=dtype, device=device, non_blocking=True), persistent=False)
   1270     pass

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Aug 18 '25 09:08 watemailunpi