GPTQ-for-LLaMa issues

Error allocating RAM

1

Im trying to run text-generation-webui on my computer, I am pretty limited with 8GB RAM, however I have an RTX 3060Ti Im trying to run it on, when running 7B...

PeterDaGrape

Installing cuda, cannot find ninja + cannot find file.

Hey there. I'm trying to install, following the steps here: https://aituts.com/llama/ I've gotten to Step 3, which says to do the following: ``` mkdir repositories cd repositories git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa...

jonplumb42

Inference using CPU

Hi, I'm interested to run LLAMA 4bit GPTQ, but I don't have a GPU. Is it possible to run this model on CPU only?

lodorg

error on amd gpu when starting setup_cuda

Hello, any one knows what's wrong with this ? `PS C:\Users\Max\llama\repositories\GPTQ-for-LLaMa> conda info active environment : None user config file : C:\Users\Max\.condarc populated config files : conda version : 4.14.0...

maxime-fleury

GPTQ vs bitsandbytes

您好，感谢您贡献的项目，我有一个地方没理解，GPTQ vs [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) 这里面的对比列出的nf4等其他的都是什么意思

iaoxuesheng

Error when load GPTQ model

followed the instruction in https://github.com/lm-sys/FastChat/blob/main/docs/gptq.md. And load the model. But there is an error when load the model using python3 -m fastchat.serve.model_worker --model-path models/llama-2-7B-GPTQ --gptq-wbits 4 --gptq-groupsize 128: 2024-02-12 21:31:13...

KyrieCui

datasets.utils.info_utils.ExpectedMoreSplits: {'validation'}

1

``` ╰─$ python llama.py /datadrive/models/Llama-2-13b-chat-hf c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors /datadrive/models/Llama-2-13b-chat-hf-gptq/llama-2-13b-4bit-gs128.safetensors Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02

SDcodehub

Syntax changed in triton.testing.do_bench() causing error when running llama_inference.py

1

Got this error when running llama_inference.py: ``` $ CUDA_VISIBLE_DEVICES=0 python llama_inference.py ${MODEL_DIR} --wbits 4 --groupsize 128 --load llama7b-4bit-128g.pt --text "this is llama" Loading model ... Found 3 unique KN Linear...

prasanna

_pickle.UnpicklingError: invalid load key, 'v'.

1

(vicuna) ahnlab@ahnlab-desktop:~/GPT/StarCoder/GPTQ-for-SantaCoder$ python -m santacoder_inference bigcode/starcoderbase --wbits 4 --groupsize 128 --load starcoderbase-GPTQ-4bit-128g/model.pt Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [01:12

ahnHeejune

inference with the saved model error: AttributeError: module 'torch.backends.cuda' has no attribute 'sdp_kernel'

2

Loading model ... Found 3 unique KN Linear values. Warming up autotune cache ... 100%|█████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:34

LuciaIsFine

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard

Metadata

Error allocating RAM

Installing cuda, cannot find ninja + cannot find file.

Inference using CPU

error on amd gpu when starting setup_cuda

GPTQ vs bitsandbytes

Error when load GPTQ model

datasets.utils.info_utils.ExpectedMoreSplits: {'validation'}

Syntax changed in triton.testing.do_bench() causing error when running llama_inference.py

_pickle.UnpicklingError: invalid load key, 'v'.

inference with the saved model error: AttributeError: module 'torch.backends.cuda' has no attribute 'sdp_kernel'

← Metadata

Owner

Metadata

GPTQ-for-LLaMa GPTQ-for-LLaMa copied to clipboard

Metadata

← Metadata

Owner

Metadata

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard