lm-evaluation-harness icon indicating copy to clipboard operation
lm-evaluation-harness copied to clipboard

Introduce perplexity per token in loglikelihood_rolling

Open dtamayo-nlp opened this issue 1 year ago • 2 comments

Hi,

Thanks for your excellent work in developing this forward.

I am interested in adding the task explained in Yarn related to computing the perplexity for long contexts. However, I find that when using the loglikelihood_rolling, the only accepted metrics are link to source:

  • word_perplexity
  • byte_perplexity
  • bits_per_byte

I have not yet managed to insert a sliding_window variable, but I think I have managed to introduce a token normalized perplexity instead of the previous implementations. I hope it helps!

To test that it works, I just followed the steps:

  • Download the repo.
  • Create a virtual environment using python -m venv my_env.
  • Install with pip install -e . once you have activated the repo source my_env/bin/activate-
  • Execute:
model=#Insert the model you want here
few_shot=0
tensor_parallelism=False
num_samples=2
output_dir=results/$(basename ${model})/${few_shot}-shot/results:$(basename ${model}):${dataset}:-shot.json
dataset=proof-pile

CUDA_VISIBLE_DEVICES=0 accelerate launch --num_processes=1 -m lm_eval --model hf \
    --model_args pretrained=$model,trust_remote_code=True \
    --tasks ${dataset} \
    --num_fewshot $few_shot \
    --batch_size 1 \
    --output_path $output_dir \
    --log_samples \
    --seed 1234 \
    --limit $num_samples

(The results seem to align with huggingface tests).

dtamayo-nlp avatar Jul 23 '24 17:07 dtamayo-nlp

Do you plan to merge this at some point or does anything speak against that? From my understanding, most data mixing papers report the perplexity per token, so I am actually surprised to see this is currently not integrated into the eval harness.

MaxiBoether avatar Nov 23 '24 23:11 MaxiBoether

Hi, I apologize for leaving this PR unfinished, I had to take on another project and it has been hard to find the time. I still have to solve some minor issues with unit tests, I can try to solve them next month, but it's not a very elegant solution, access to tokens is limited and this was a patch, for other future features they might want to change this code in the future, so I am not sure whether this PR will be finally accepted.

dtamayo-nlp avatar Nov 25 '24 08:11 dtamayo-nlp