auto-round
auto-round copied to clipboard
Unexpected ppl diff
I'm now trying to quantize llama2-7b under w4a16g128 setting.
The script is
python3 main.py \ --model_name /mnt/bn/wyh-train/4bit/models/llama2-7b/model \ --device 0 \ --group_size 128 \ --bits 4 \ --iters 1000 \ --deployment_device 'fake,cpu,gpu' \ --output_dir "/mnt/bn/wyh-train/4bit/models/llama2-7b-auto-round"
The result is wikitext2 c4 llama2-7b-fp16 5.4721 6.9727 llama2-7b-w4a16g128(auto_round) 10.4401 7.4204
Any Insight here?