llm-awq
llm-awq copied to clipboard
INT4-AWQ PPL results for LLaMA-2 model are not as expected
Hi, I have a question about why the int4-AWQ PPL results of the LLaMA-2 model very different from the paper? in paper result, LLaMA-2 + int4-AWQ PPL result is 5.60, but I got PPL result is 16.38. I want to confirm if my 16.38 result is correct? And why are the results so different?
Implementation steps:
- Perform AWQ search and save search results (already did it in awq_cache)
- Evaluate the AWQ quantized model on WikiText-2 (simulated pseudo quantization)
- Generate real quantized weights (INT4)
- Load and evaluate the real quantized model (now you can see smaller gpu memory usage)
python -m awq.entry --model_path llama-2-7b-hf
--tasks wikitext
--w_bit 4 --q_group_size 128
--load_quant quant/llama-2-7b-hf-w4-g128-awq.pt