pyllama
pyllama copied to clipboard
Meaningless Prediction in 13B 2bit
I have quantized the 13B model to 2bit by executing:
python -m llama.llama_quant decapoda-research/llama-13b-hf c4 --wbits 2 --save pyllama-13B2b.pt
After the quantization when I run the test inference the output seams completely random:
python quant_infer.py --model decapoda-research/llama-13b-hf --wbits 2 --load ../pyllama-13B2b.pt --text "the meaning of life is" --max_length 24 --cuda cuda:0
tune it with groupsize 128/256... and other options like asymmetric/symmetric, mse?
Okay, will try the other options. So far, I tried only the groupsize 128, and the result was even worse.
Several people complaining on the garbage in the output here #58.