feat: add perplexity example

Open AlpinDale opened this issue 1 year ago • 0 comments

This PR adds an example to calculate model perplexity, this approach performs it by using prompt_logprobs:

Generate text using a dataset and extract 1 prompt logprob per token.
Calculate the mean of the logprobs.
Calculate the exponent of the negative mean of the logprobs.
Divide the result by 2.

The last step should technically be unneeded, but to match the perplexity results from llama.cpp, we seem to need the division by two. It still seems unreliable, as Llama-2 7B shows a lower ppx than Mistral 7B, and FP8 KV Cache doesn't change ppx at all.

Mar 12 '24 19:03 AlpinDale