GPTQ-for-LLaMa icon indicating copy to clipboard operation
GPTQ-for-LLaMa copied to clipboard

wbit=16 Conversion Gives Error

Open sawradip opened this issue 2 years ago • 2 comments

When I try to run the quantization pipeline for 16-bit precision,

CUDA_VISIBLE_DEVICES=0 python llama.py ./llama-hf/llama-7b c4 --wbits 16 --true-sequential --act-order --save llama7b-16bit.pt

It raises error that quantizers are not available.

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [01:23<00:00, 41.70s/it]
Found cached dataset json (/home/sawradip/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Found cached dataset json (/home/sawradip/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Traceback (most recent call last):
  File "/mnt/c/Users/Sawradip/Desktop/practice_code/practice_llm/GPTQ-for-LLaMa/llama.py", line 480, in <module>
    llama_pack(model, quantizers, args.wbits, args.groupsize)
NameError: name 'quantizers' is not defined

The llama.py file defines quantizer only for wbit<16,

    if not args.load and args.wbits < 16 and not args.nearest:
        tick = time.time()
        quantizers = llama_sequential(model, dataloader, DEV)
        print(time.time() - tick)

which is expected, because quantizers not needed for 16 bit. But I think this error should be handled in a more elegant way, as we already allow wbit=16.

What's your take on that? @qwopqwop200

sawradip avatar Apr 29 '23 05:04 sawradip

16bit is not the original model precision? I guess there is not need to assign the argument wbits for the original model

yhyu13 avatar Apr 30 '23 11:04 yhyu13

Write a python script to convert to FP16 from FP32.. don't use GPTQ.

Ph0rk0z avatar May 13 '23 14:05 Ph0rk0z