GPTQ-for-LLaMa icon indicating copy to clipboard operation
GPTQ-for-LLaMa copied to clipboard

4 bits quantization of LLaMa using GPTQ

Results 96 GPTQ-for-LLaMa issues
Sort by recently updated
recently updated
newest added

When I try to run the quantization pipeline for 16-bit precision, ``` CUDA_VISIBLE_DEVICES=0 python llama.py ./llama-hf/llama-7b c4 --wbits 16 --true-sequential --act-order --save llama7b-16bit.pt ``` It raises error that quantizers are...

My configuration is as follows: - Arch linux, fully up to date, nvidia drivers installed and configured correctly, cuda installed and configured correctly, the works - Podman image build using...

``` (textgen) ubuntu@anon:~/text-generation-webui/repositories/GPTQ-for-LLaMa$ stdbuf --output=L python -u llama.py ~/text-generation-webui/models/llama-7b-hf c4 \ > --wbits 4 \ > --groupsize 128 \ > --load ~/text-generation-webui/models/llama-7b-4bit-128g_true-seq_act-order.safetensors \ > --benchmark 2048 \ > --check 2>&1...

If we set `bit = 4` an `sym = True` ```python if self.sym: self.zero = torch.full_like(self.scale, (self.maxq + 1) / 2) # maxq = 2 ** 4 - 1 =...

(base) ub2004@ub2004-B85M-A0:/data-ssd-1t/hf_model/llama-7b$ git remote -v origin https://huggingface.co/huggyllama/llama-7b (fetch) origin https://huggingface.co/huggyllama/llama-7b (push) (base) ub2004@ub2004-B85M-A0:/data-ssd-1t/hf_model/llama-7b$

Hi, I'm confused about the fine-grained of weight quantization. For example, give a weights W with size of [4096, 4096], and the groupsize is 128. We perform per-channel quantization, hoping...

I make an attempts to run this method on A100, and i have meet error as follow pic. And, i test torch-1.3.1 and torch-2.0.0 that both of them could produce...

Running into this issue when trying to generate with over around 130 tokens in context on my M40. Generation works fine for small contexts, but errors out at larger contexts...

When I use the 4bits llama model and run loss.backward(), the following error occurs: Traceback (most recent call last): File "/home/zzy/anaconda3/envs/gptq/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File...

My config: WSL2 on window 10, GPU -> NVIDIA 1660 super torch 2.0 installed the MODEL_DIR point to a 13B llama model hf type folder (it's Vicuna) When I run...