ART
ART copied to clipboard
RuntimeError: CUDA error: device-side assert triggered when running 2048.ipynb
RuntimeError occurs when running 2048.ipynb at this link link: https://colab.research.google.com/github/openpipe/art/blob/main/examples/2048/2048.ipynb
loading model from .art/2048-multi-turn/models/agent-002/0010
==((====))== Unsloth 2025.5.1: Fast Qwen2 patching. Transformers: 4.51.3. vLLM: 0.8.5.post1.
\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \ Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\ / Bfloat16 = FALSE. FA [Xformers = 0.0.29.post2. FA2 = False]
"-____-" Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[/tmp/ipython-input-3745976913.py](https://localhost:8080/#) in <cell line: 0>()
9 print(f"loading model from {lora_model_path}\n")
10
---> 11 peft_model, tokenizer = FastLanguageModel.from_pretrained(
12 model_name=lora_model_path,
13 max_seq_length=16384,
9 frames
[/usr/local/lib/python3.11/dist-packages/unsloth/models/llama.py](https://localhost:8080/#) in _set_cos_sin_cache(self, seq_len, device, dtype)
1266 # Different from paper, but it uses a different permutation in order to obtain the same calculation
1267 emb = torch.cat((freqs, freqs), dim=-1)
-> 1268 self.register_buffer("cos_cached", emb.cos().to(dtype=dtype, device=device, non_blocking=True), persistent=False)
1269 self.register_buffer("sin_cached", emb.sin().to(dtype=dtype, device=device, non_blocking=True), persistent=False)
1270 pass
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.