GPTQ-for-LLaMa icon indicating copy to clipboard operation
GPTQ-for-LLaMa copied to clipboard

4 bits quantization of LLaMa using GPTQ

Results 96 GPTQ-for-LLaMa issues
Sort by recently updated
recently updated
newest added

Im trying to run text-generation-webui on my computer, I am pretty limited with 8GB RAM, however I have an RTX 3060Ti Im trying to run it on, when running 7B...

Hey there. I'm trying to install, following the steps here: https://aituts.com/llama/ I've gotten to Step 3, which says to do the following: ``` mkdir repositories cd repositories git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa...

Hi, I'm interested to run LLAMA 4bit GPTQ, but I don't have a GPU. Is it possible to run this model on CPU only?

Hello, any one knows what's wrong with this ? `PS C:\Users\Max\llama\repositories\GPTQ-for-LLaMa> conda info active environment : None user config file : C:\Users\Max\.condarc populated config files : conda version : 4.14.0...

您好,感谢您贡献的项目,我有一个地方没理解,GPTQ vs [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) 这里面的对比列出的nf4等其他的都是什么意思

followed the instruction in https://github.com/lm-sys/FastChat/blob/main/docs/gptq.md. And load the model. But there is an error when load the model using python3 -m fastchat.serve.model_worker --model-path models/llama-2-7B-GPTQ --gptq-wbits 4 --gptq-groupsize 128: 2024-02-12 21:31:13...

``` ╰─$ python llama.py /datadrive/models/Llama-2-13b-chat-hf c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors /datadrive/models/Llama-2-13b-chat-hf-gptq/llama-2-13b-4bit-gs128.safetensors Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02

Got this error when running llama_inference.py: ``` $ CUDA_VISIBLE_DEVICES=0 python llama_inference.py ${MODEL_DIR} --wbits 4 --groupsize 128 --load llama7b-4bit-128g.pt --text "this is llama" Loading model ... Found 3 unique KN Linear...

(vicuna) ahnlab@ahnlab-desktop:~/GPT/StarCoder/GPTQ-for-SantaCoder$ python -m santacoder_inference bigcode/starcoderbase --wbits 4 --groupsize 128 --load starcoderbase-GPTQ-4bit-128g/model.pt Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [01:12

Loading model ... Found 3 unique KN Linear values. Warming up autotune cache ... 100%|█████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:34