text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

error when starting the 4bit model

Open tareaps opened this issue 2 years ago • 4 comments

Describe the bug

found and downloaded the llama here https://huggingface.co/elinas/llama-30b-int4/tree/main all files are in the folder text-generation-webui\models\llama30b When I run it, it says cuda is not installed, but I downloaded it, also can't find groupize

Is there an existing issue for this?

  • [X] I have searched the existing issues

Reproduction

Screenshot

No response

Logs

C:\text-generation-webui>python server.py --gptq-bits 4 --gptq-model-type llama
Loading llama30b...
CUDA extension not installed.
Traceback (most recent call last):
  File "C:\text-generation-webui\server.py", line 243, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "C:\text-generation-webui\modules\models.py", line 101, in load_model
    model = load_quantized(model_name)
  File "C:\text-generation-webui\modules\GPTQ_loader.py", line 64, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
TypeError: load_quant() missing 1 required positional argument: 'groupsize'

System Info

64 gb ram

tareaps avatar Mar 23 '23 10:03 tareaps

Maybe this is the wrong version of llama, then tell me a working version.

tareaps avatar Mar 23 '23 10:03 tareaps

run git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4 inside GPTQ-for-LLaMa folder.

jllllll avatar Mar 23 '23 12:03 jllllll

Don't forget to recompile the cuda kernel

python setup_cuda.py install

Ph0rk0z avatar Mar 23 '23 13:03 Ph0rk0z

Worked here. Thank you!

Fortyseven avatar Mar 24 '23 04:03 Fortyseven

Hi! I have another error in this case

Loading alpaca-13b-lora-int4...
Traceback (most recent call last):
  File "/home/text-generation/server.py", line 275, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/text-generation/modules/models.py", line 101, in load_model
    model = load_quantized(model_name)
  File "/home/text-generation/modules/GPTQ_loader.py", line 78, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize)
TypeError: load_quant() takes 3 positional arguments but 4 were given

Try to install run git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4 inside GPTQ-for-LLaMa folder. Bot no effect

kreolsky avatar Mar 27 '23 21:03 kreolsky

Hi! I have another error in this case

Loading alpaca-13b-lora-int4...
Traceback (most recent call last):
  File "/home/text-generation/server.py", line 275, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/text-generation/modules/models.py", line 101, in load_model
    model = load_quantized(model_name)
  File "/home/text-generation/modules/GPTQ_loader.py", line 78, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize)
TypeError: load_quant() takes 3 positional arguments but 4 were given

Try to install run git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4 inside GPTQ-for-LLaMa folder. Bot no effect

Webui currently uses the latest GPTQ-for-LLaMa. Don't use git reset --hard 468c47c01b4fe370616747b6d69a2d3f48bab5e4

jllllll avatar Mar 27 '23 21:03 jllllll

The model he is loading won't work with the current GPTQ afaik.

Ph0rk0z avatar Mar 27 '23 21:03 Ph0rk0z

Make sure to check this link for the up to date information. I keep it updated.

https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode

oobabooga avatar Mar 29 '23 02:03 oobabooga