pyllama icon indicating copy to clipboard operation
pyllama copied to clipboard

Meaningless Prediction in 13B 2bit

Open axenov opened this issue 1 year ago • 3 comments

I have quantized the 13B model to 2bit by executing:

python -m llama.llama_quant decapoda-research/llama-13b-hf c4 --wbits 2 --save pyllama-13B2b.pt

After the quantization when I run the test inference the output seams completely random:

python quant_infer.py --model decapoda-research/llama-13b-hf --wbits 2 --load ../pyllama-13B2b.pt --text "the meaning of life is" --max_length 24 --cuda cuda:0

Screenshot from 2023-03-24 22-44-32

axenov avatar Mar 24 '23 21:03 axenov

tune it with groupsize 128/256... and other options like asymmetric/symmetric, mse?

juncongmoo avatar Mar 24 '23 22:03 juncongmoo

Okay, will try the other options. So far, I tried only the groupsize 128, and the result was even worse.

axenov avatar Mar 24 '23 23:03 axenov

Several people complaining on the garbage in the output here #58.

sskorol avatar Apr 08 '23 22:04 sskorol