GPTQ-for-LLaMa issues

Porting GPTQ to CPU?

2

Is it possible to run GPTQ on a machine that has only CPUs? If not, is there a plan for it?

the inference speed of GPTQ 4bit quantized model

2

does someone have compared the inference speed of 4bit quantized model with the origin FP16 model? is it faster than the origin FP16 model?

pineking

Support Mistral.

Mistral 7B is dominating the local LLM scene right now and your software doesn't load it. I need your software to work with it... Can we please make your software...

nbollman

error: block with no terminator, has llvm.cond_br %5624, ^bb2, ^bb3

I followed the tutorial in the README to run the code，But when I run this sentence ```python3 CUDA_VISIBLE_DEVICES=0 python llama.py ${MODEL_DIR} c4 --wbits 4 --groupsize 128 --load llama7b-4bit-128g.pt --benchmark 2048...

Hukongtao

neox.py needs to add "import math"

https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/e985b700f19e670bad9b949cd83056889dd31448/neox.py#L302 This line needs "import math" on the head.

StudyingShao

LoRa and diff with bitsandbytes

1. What changes would I need to make for GPTQ to support LoRa for Llama 2? 2. What's the main difference between GPTQ vs bitsandbytes? Is it that GPTQ re-adjusts...

RonanKMcGovern

Transformers broke again (AttributeError: 'GPTQ' object has no attribute 'inp1')

1

``` CUDA_VISIBLE_DEVICES=0 python llama.py /mnt/g/models/conceptofmind_LLongMA-2-13b c4 --wbits 4 --true-sequential --act-order --groupsize 32 --save_safetensors /mnt/g/models/LLongMA-2-13b-16k-GPTQ/4bit-32g-tsao.safetensors Found cached dataset json (/home/anon/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51) Found cached dataset json (/home/anon/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51) Token indices sequence length is longer...

EyeDeck