[Question] In the Introduction_to_Weight_Quantization article, the calculation in the `calculate_perplexity` section seems wrong

Open yrom opened this issue 9 months ago • 1 comments

Really enjoyed your clear explanation of weight quantization 🥰

But I have a question about the calculation comparison of calculate_perplexity.

In the article, calculates perplexity using each model's own generated output:

ppl     = calculate_perplexity(model, original_text)      # Model evaluates its OWN output
ppl_abs = calculate_perplexity(model_abs, absmax_text)    # Quantized model evaluates its OWN output
ppl_zp  = calculate_perplexity(model_zp, absmax_text)     # Zero-point model evaluates ANOTHER model's output

For more comparable results, should we instead evaluate all models on:

The same input prompt ("I have a dream"), or
A standard validation dataset?

e.g.:

reference_text = "I have a dream"  # or the other validation input

ppl_orig = calculate_perplexity(model, reference_text)  
ppl_abs  = calculate_perplexity(model_abs, reference_text)  
ppl_zp   = calculate_perplexity(model_zp, reference_text)

May 29 '25 04:05 yrom

Thanks @yrom! You're completely right, this is a mistake. They should all use the same text to make a proper comparison. I will fix this asap.

May 31 '25 18:05 mlabonne

I would like to contribute

Aug 11 '25 09:08 NandiniV-20