GPTQ-for-LLaMa issues

potential Mistakes in the test data selection for perplexity evaluation

ptb_text_only uses the **validation** file instead of the **test** file. while it is still from the same dataset, and should result in similar results, makes **1 to 1 comparisons** difficult....

Green-Sky

Error when installing cuda kernel

4

If I follow the instructions in the readme, I'm getting an error now even though it worked a few days ago. ``` conda create --name gptq python=3.9 -y conda activate...

plhosk

Extraneous data point

3

LLaMa-13B-GPTQ-4-128 says C4 scores 7.60. That seems out of place compared to 16,8,3 bits. Was that a typo, intended to be 6.60 or 6.70?

philipturner

Having trouble using saved models

1

When I try and use the model, I see errors on very layer in the model: ``` size mismatch for model.layers.77.mlp.down_proj.scales: copying a param with shape torch.Size([8192, 1]) from checkpoint,...

dnhkng

Issues with cuda setup

I attempt to install using python setup_cuda.py install and get the error trace below. Checking my nvidia cuda install, I get the following trace: nvcc: NVIDIA (R) Cuda compiler driver...

IridiumMaster

Running on CPU

3

Is it possible to run quantization on CPU? Or quantize layer-by-layer without loading whole model in VRAM? I want to quantize a large model, but it not fits in VRAM.

mayaeary

The detected CUDA version (12.0) mismatches the version that was used to compile PyTorch (11.7)

7

Hello, First of all, thank you for making this. The error happens when I'm trying to execute `setup_cuda.py.` Is it possible for you (or anyone) to compile the same library...

ThatCoffeeGuy

TypeError: load_quant() missing 1 required positional argument: 'groupsize'

Using latest `main` ``` (textgen) acidhax@PC:~/text-generation-webui$ python server.py --listen --auto-devices --model llama-13b-hf --gptq-bits 4 Loading llama-13b-hf... Traceback (most recent call last): File "/home/acidhax/text-generation-webui/server.py", line 242, in shared.model, shared.tokenizer = load_model(shared.model_name)...

matbeedotcom

adding ipynb fle for building on colab

1

Working at getting an example of building a quantized 7B file working on colab. Seems useful to have to compile instructions for users and test new versions of pytorch /...

guccialex

Is compute time expected to go up linearly with batch size?

From a quick test, I noticed that the 4-bit code gets linearly slowly as I increase the batch size: - bs=1: 1.97 s - bs=8: 15.5 s - bs=64: 127s...

zphang

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard

Metadata

potential Mistakes in the test data selection for perplexity evaluation

Error when installing cuda kernel

Extraneous data point

Having trouble using saved models

Issues with cuda setup

Running on CPU

The detected CUDA version (12.0) mismatches the version that was used to compile PyTorch (11.7)

TypeError: load_quant() missing 1 required positional argument: 'groupsize'

adding ipynb fle for building on colab

Is compute time expected to go up linearly with batch size?

← Metadata

Owner

Metadata

GPTQ-for-LLaMa GPTQ-for-LLaMa copied to clipboard

Metadata

← Metadata

Owner

Metadata

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard