llm-course
llm-course copied to clipboard
[Question] In the Introduction_to_Weight_Quantization article, the calculation in the `calculate_perplexity` section seems wrong
Really enjoyed your clear explanation of weight quantization 🥰
But I have a question about the calculation comparison of calculate_perplexity.
In the article, calculates perplexity using each model's own generated output:
ppl = calculate_perplexity(model, original_text) # Model evaluates its OWN output
ppl_abs = calculate_perplexity(model_abs, absmax_text) # Quantized model evaluates its OWN output
ppl_zp = calculate_perplexity(model_zp, absmax_text) # Zero-point model evaluates ANOTHER model's output
For more comparable results, should we instead evaluate all models on:
- The same input prompt ("I have a dream"), or
- A standard validation dataset?
e.g.:
reference_text = "I have a dream" # or the other validation input
ppl_orig = calculate_perplexity(model, reference_text)
ppl_abs = calculate_perplexity(model_abs, reference_text)
ppl_zp = calculate_perplexity(model_zp, reference_text)
Thanks @yrom! You're completely right, this is a mistake. They should all use the same text to make a proper comparison. I will fix this asap.
I would like to contribute