lm-evaluation-harness add new truncation strategy

add new truncation strategy

Open artemorloff opened this issue 5 months ago • 3 comments

Hew feature - "smart truncation". This PR:

allow for different truncation modes, not only cutting off ALL tokens from the left (which is now the only available option, that still does not respect special tokens kile BOS for Gemma - BOS is truncated the first:) )
truncation is performed while constructing the task. So that the user knows about it before the sequences are fed into the model
allows for notification system for users that indicates how many requests have been truncated, and to what extent. This helps decide on the correctness of the metrics. If you have only one token for each tes tquestion, this is incorrect estimation - the model has not even seen the real test questions
logging would include the sequences that the LLM has got. Not the full requests from the tasks. Seems that logging is meant to record the real requests that were passed into the model, but not the things that the LLM may not have encountered
suggests symbol-wise truncation for APIs with no tokenizer. Till now these APIs cannot be assessed via harness if at least one task sample is too long.

Right now the fewshots strategy is realized and tested. It allows to cut off the entire fewshots, respecting special tokens. The returned status contains either error string or the number of fewshots to be remained.

Sep 15 '24 16:09 artemorloff

lm-evaluation-harness lm-evaluation-harness copied to clipboard

add new truncation strategy

lm-evaluation-harness
lm-evaluation-harness copied to clipboard