GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard
CUDA error: unknown error (Error when quantize llama Model)
My config:
WSL2 on window 10, GPU -> NVIDIA 1660 super torch 2.0 installed
the MODEL_DIR point to a 13B llama model hf type folder (it's Vicuna)
When I run CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=1 python llama.py ${MODEL_DIR} c4 --wbits 4 -- true-sequential --act-order --groupsize 128 --save llama7b-4bit-128g.pt
(The CUDA_LAUNCH_BLOCKING=1
is for debugging)
I got :
Starting ...
Ready.
Traceback (most recent call last):
File "/mnt/d/DataScience/GPTQ-for-LLaMa/llama.py", line 452, in <module>
quantizers = llama_sequential(model, dataloader, DEV)
File "/home/ostix/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/mnt/d/DataScience/GPTQ-for-LLaMa/llama.py", line 72, in llama_sequential
layer = layers[i].to(dev)
File "/home/ostix/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/home/ostix/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/ostix/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/ostix/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/home/ostix/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: unknown error
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I tried with the Cuda and triton branch and it's the same My be my problem is with vicuna and not with this rep Thanks for your help
Even with the 7B llama model I have the same error