GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard
4 bits quantization of LLaMa using GPTQ
Im trying to run text-generation-webui on my computer, I am pretty limited with 8GB RAM, however I have an RTX 3060Ti Im trying to run it on, when running 7B...
Hey there. I'm trying to install, following the steps here: https://aituts.com/llama/ I've gotten to Step 3, which says to do the following: ``` mkdir repositories cd repositories git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa...
Hi, I'm interested to run LLAMA 4bit GPTQ, but I don't have a GPU. Is it possible to run this model on CPU only?
Hello, any one knows what's wrong with this ? `PS C:\Users\Max\llama\repositories\GPTQ-for-LLaMa> conda info active environment : None user config file : C:\Users\Max\.condarc populated config files : conda version : 4.14.0...
您好,感谢您贡献的项目,我有一个地方没理解,GPTQ vs [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) 这里面的对比列出的nf4等其他的都是什么意思
followed the instruction in https://github.com/lm-sys/FastChat/blob/main/docs/gptq.md. And load the model. But there is an error when load the model using python3 -m fastchat.serve.model_worker --model-path models/llama-2-7B-GPTQ --gptq-wbits 4 --gptq-groupsize 128: 2024-02-12 21:31:13...
``` ╰─$ python llama.py /datadrive/models/Llama-2-13b-chat-hf c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors /datadrive/models/Llama-2-13b-chat-hf-gptq/llama-2-13b-4bit-gs128.safetensors Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02
Got this error when running llama_inference.py: ``` $ CUDA_VISIBLE_DEVICES=0 python llama_inference.py ${MODEL_DIR} --wbits 4 --groupsize 128 --load llama7b-4bit-128g.pt --text "this is llama" Loading model ... Found 3 unique KN Linear...
(vicuna) ahnlab@ahnlab-desktop:~/GPT/StarCoder/GPTQ-for-SantaCoder$ python -m santacoder_inference bigcode/starcoderbase --wbits 4 --groupsize 128 --load starcoderbase-GPTQ-4bit-128g/model.pt Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [01:12
Loading model ... Found 3 unique KN Linear values. Warming up autotune cache ... 100%|█████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:34