llm-awq icon indicating copy to clipboard operation
llm-awq copied to clipboard

INT4-AWQ PPL results for LLaMA-2 model are not as expected

Open xianwujie opened this issue 1 year ago • 9 comments

Hi, I have a question about why the int4-AWQ PPL results of the LLaMA-2 model very different from the paper? in paper result, LLaMA-2 + int4-AWQ PPL result is 5.60, but I got PPL result is 16.38. I want to confirm if my 16.38 result is correct? And why are the results so different?

Implementation steps:

  1. Perform AWQ search and save search results (already did it in awq_cache)
  2. Evaluate the AWQ quantized model on WikiText-2 (simulated pseudo quantization)
  3. Generate real quantized weights (INT4)
  4. Load and evaluate the real quantized model (now you can see smaller gpu memory usage) python -m awq.entry --model_path llama-2-7b-hf
    --tasks wikitext
    --w_bit 4 --q_group_size 128
    --load_quant quant/llama-2-7b-hf-w4-g128-awq.pt

xianwujie avatar Nov 03 '23 05:11 xianwujie