GPTQ-for-LLaMa issues

Help: Quantized llama-7b model with custom prompt format produces only gibberish

1

Could someone help me with **how to quantize my own model with GPTQ-for-LLaMA**? See screenshot of the output I am getting :cry: **Original full model**: https://huggingface.co/Glavin001/startup-interviews-13b-int4-2epochs-1 **Working quantized model with...

Glavin001

Fixing Triton -"Unexpected MMA layout version found" for prevolta GPUs raises new problems

5

Got the same error as [issue 142](https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/142#issuecomment-1507778779) - AttributeError: module ‘triton.compiler’ has no attribute ‘OutOfResources’- after @geekypathak21's solution(see [PR 1505](https://github.com/openai/triton/pull/1505)) on getting around the problem of matmul issue of prevolta...

DragonLiu1995

Proposed changes to reduce VRAM usage. Potentially quantize larger models on consumer hardware.

3

Hello everyone, Recently I noticed a lack of 4-bit quantized versions of `Google/flan-ul2` on HF, and so, decided to set out to quantize the model on my 4090. I struggled...

sigmareaver

Issue with GPTQ

1

I have the following problem: `model=Honkware/openchat_8192-GPTQ ` `text-generation-launcher --model-id $model --num-shard 1 --quantize gptq --port 8080 ` ``` Traceback (most recent call last): File "/home/abalogh/anaconda3/envs/text-generation-inference/bin/text-generation-server", line 8, in sys.exit(app()) ^^^^^...

d0lphin

High PPL when groupsize != -1 for OPT model after replace linear layer with quantlinear.

1

I tried to test GPTQ's PPL metrics on the opt model via opt.py. The PPL metrics of the opt model are normal with the use of fake quantization. However, when...

hyx1999

An error is reported when running python setup_cuda.py install

2

(textgen) quanlian@quanlian-System-Product-Name:~/aigc/text-generation-webui/repositories/GPTQ-for-LLaMa$ python setup_cuda.py install running install /home/quanlian/mambaforge/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated. !! ******************************************************************************** Please avoid running ``setup.py`` directly. Instead, use pypa/build, pypa/installer, pypa/build or other standards-based tools. See...

linuxdevopscn

Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.

2

I converted Llama weightsand quantized, but I got this error when I ran the inference. Could someone help me and let me know how I can fix it? Thanks! Here...

chigkim

set-soft

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard

Metadata

Help: Quantized llama-7b model with custom prompt format produces only gibberish

Fixing Triton -"Unexpected MMA layout version found" for prevolta GPUs raises new problems

Proposed changes to reduce VRAM usage. Potentially quantize larger models on consumer hardware.

Issue with GPTQ

High PPL when groupsize != -1 for OPT model after replace linear layer with quantlinear.

An error is reported when running python setup_cuda.py install

Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.

Could not obtain official perplexity using bloom_eval()

llama_inference 4bits error

[Question] What is the expected discrepancy between simulated and actually computed values?

← Metadata

Owner

Metadata

GPTQ-for-LLaMa GPTQ-for-LLaMa copied to clipboard

Metadata

← Metadata

Owner

Metadata

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard