lm-evaluation-harness Minor features

Minor features

Open artemorloff opened this issue 6 months ago • 5 comments

Features:

disable fewshot_as_multiturn when apply_chat_template is not passed or num_fewshots=0. Why failing the run? For zero-shot setup multiturn==simple chat template, so no error at all. If chat_template is not enabled, then throw warning and disable multiturn (as long as it is not available without chat_template)
pass predict_only into filters apply method. Why? The filters are designed to be used even with additional ML models (reward, for example). Then if one runs lm-eval with predict_only this may mean that the filter is not to be used. No user may customize filters to use predict_only info to manage filters behaviour
add filter_device param from cli. There was a TODO about it. If I use another LLM as a filter, I may need to pass device that DIFFERS from one used to run the "main" LLM. Like llm-as-a-judge or LLMs to score the generations
disable ensure_ascii for apply_chat_template method of TemplateAPI class. Now cyrillic symbols are stored in a valid form
add f1_macro and f1_micro metrics (aggregations in fact) to register to handle multi-class classification tasks
new param into model_args for APIs - timeout. When running vLLM server and using lm-eval in OpenAI API mode to make requests into this server, timeout may be increased (like to run Llama-3.1-405B, for me I had lots of connection errors, that have been solved by increasing the timeout param)

Aug 25 '24 14:08 artemorloff