lighteval Expose a few model predictions / gold answers in the logs

Expose a few model predictions / gold answers in the logs

Open lewtun opened this issue 4 months ago • 1 comments

For generative benchmarks like MATH / GSM8k / IFEval, it would be great to have some visibility in the logs on how the prompts are formatted, what the generations look like, what the gold answer is etc.

Currently, the best approach I've found is to first run the benchmark with --max_samples and then manually inspect the details Parquet file. However this is rather cumbersome, especially when launching many evals in parallel :)

Perhaps we can store the first N examples in the logs?

Apr 21 '24 18:04 lewtun

lighteval lighteval copied to clipboard

Expose a few model predictions / gold answers in the logs

lighteval
lighteval copied to clipboard