lm-evaluation-harness
lm-evaluation-harness copied to clipboard
How to calculate the "token_perplexity"
Now in the harness, there are metrics called "byte_perplexity" and "word_perplexity". These two metrics normalize the perplexity by the length of characters and words, respectively. If we want to normalize the perplexity by tokens (i.e., the length of tokens as cut by the tokenizer), how can we calculate it effectively?
I tried modifying the function process_results in the ConfigurableTask class, but I found it difficult to obtain the token length of the target. Does anyone have a good idea on how to calculate it?