GPTQ-for-LLaMa icon indicating copy to clipboard operation
GPTQ-for-LLaMa copied to clipboard

4 bits quantization of LLaMa using GPTQ

Results 96 GPTQ-for-LLaMa issues
Sort by recently updated
recently updated
newest added

ptb_text_only uses the **validation** file instead of the **test** file. while it is still from the same dataset, and should result in similar results, makes **1 to 1 comparisons** difficult....

If I follow the instructions in the readme, I'm getting an error now even though it worked a few days ago. ``` conda create --name gptq python=3.9 -y conda activate...

LLaMa-13B-GPTQ-4-128 says C4 scores 7.60. That seems out of place compared to 16,8,3 bits. Was that a typo, intended to be 6.60 or 6.70?

When I try and use the model, I see errors on very layer in the model: ``` size mismatch for model.layers.77.mlp.down_proj.scales: copying a param with shape torch.Size([8192, 1]) from checkpoint,...

I attempt to install using python setup_cuda.py install and get the error trace below. Checking my nvidia cuda install, I get the following trace: nvcc: NVIDIA (R) Cuda compiler driver...

Is it possible to run quantization on CPU? Or quantize layer-by-layer without loading whole model in VRAM? I want to quantize a large model, but it not fits in VRAM.

Hello, First of all, thank you for making this. The error happens when I'm trying to execute `setup_cuda.py.` Is it possible for you (or anyone) to compile the same library...

Using latest `main` ``` (textgen) acidhax@PC:~/text-generation-webui$ python server.py --listen --auto-devices --model llama-13b-hf --gptq-bits 4 Loading llama-13b-hf... Traceback (most recent call last): File "/home/acidhax/text-generation-webui/server.py", line 242, in shared.model, shared.tokenizer = load_model(shared.model_name)...

Working at getting an example of building a quantized 7B file working on colab. Seems useful to have to compile instructions for users and test new versions of pytorch /...

From a quick test, I noticed that the 4-bit code gets linearly slowly as I increase the batch size: - bs=1: 1.97 s - bs=8: 15.5 s - bs=64: 127s...