GPTQ-for-LLaMa issues

wbit=16 Conversion Gives Error

2

When I try to run the quantization pipeline for 16-bit precision, ``` CUDA_VISIBLE_DEVICES=0 python llama.py ./llama-hf/llama-7b c4 --wbits 16 --true-sequential --act-order --save llama7b-16bit.pt ``` It raises error that quantizers are...

sawradip

"CUDA Error: No kernel image is available"

3

My configuration is as follows: - Arch linux, fully up to date, nvidia drivers installed and configured correctly, cuda installed and configured correctly, the works - Podman image build using...

Yona-W

Benchmark broken on H100

``` (textgen) ubuntu@anon:~/text-generation-webui/repositories/GPTQ-for-LLaMa$ stdbuf --output=L python -u llama.py ~/text-generation-webui/models/llama-7b-hf c4 \ > --wbits 4 \ > --groupsize 128 \ > --load ~/text-generation-webui/models/llama-7b-4bit-128g_true-seq_act-order.safetensors \ > --benchmark 2048 \ > --check 2>&1...

FrederikAbitz

question about the zero_point

If we set `bit = 4` an `sym = True` ```python if self.sym: self.zero = torch.full_like(self.scale, (self.maxq + 1) / 2) # maxq = 2 ** 4 - 1 =...

irasin

where to get /path/to/downloaded/llama/weights

(base) ub2004@ub2004-B85M-A0:/data-ssd-1t/hf_model/llama-7b$ git remote -v origin https://huggingface.co/huggyllama/llama-7b (fetch) origin https://huggingface.co/huggyllama/llama-7b (push) (base) ub2004@ub2004-B85M-A0:/data-ssd-1t/hf_model/llama-7b$

SeekPoint

About the fine-grained of weight quantization

Hi, I'm confused about the fine-grained of weight quantization. For example, give a weights W with size of [4096, 4096], and the groupsize is 128. We perform per-channel quantization, hoping...

xingyueye

Error on A100，device kernel image is invalid

I make an attempts to run this method on A100， and i have meet error as follow pic. And, i test torch-1.3.1 and torch-2.0.0 that both of them could produce...

lileilai

IndexError: tensors used as indices must be long, byte or bool tensors

2

Running into this issue when trying to generate with over around 130 tokens in context on my M40. Generation works fine for small contexts, but errors out at larger contexts...

Pathos14489

How can I get the gradient when using 4bits model?

When I use the 4bits llama model and run loss.backward(), the following error occurs: Traceback (most recent call last): File "/home/zzy/anaconda3/envs/gptq/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File...

Joanna-0421

CUDA error: unknown error (Error when quantize llama Model)

1

My config: WSL2 on window 10, GPU -> NVIDIA 1660 super torch 2.0 installed the MODEL_DIR point to a 13B llama model hf type folder (it's Vicuna) When I run...

ostix360

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard

Metadata

wbit=16 Conversion Gives Error

"CUDA Error: No kernel image is available"

Benchmark broken on H100

question about the zero_point

where to get /path/to/downloaded/llama/weights

About the fine-grained of weight quantization

Error on A100，device kernel image is invalid

IndexError: tensors used as indices must be long, byte or bool tensors

How can I get the gradient when using 4bits model?

CUDA error: unknown error (Error when quantize llama Model)

← Metadata

Owner

Metadata

GPTQ-for-LLaMa GPTQ-for-LLaMa copied to clipboard

Metadata

← Metadata

Owner

Metadata

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard