lighteval
lighteval copied to clipboard
[RFC] Rework the suites to be inferred
This PR proposes a change in the way tasks and suites are managed. It's a proof of concept that isn't extensively tested at this point, in order to check whether this would make sense to pursue.
This PR's objective is twofold:
-
num_fewshotsis now inferred to0by default if not specified. -
suiteis now automatically selected when a single suite contains the task requested. If the task is handled by more than one suite, an error is thrown.
Example of commands that would work:
- lighteval accelerate model_name=gpt2 "extended|mt_bench|0"
+ lighteval accelerate model_name=gpt2 mt_bench
- lighteval accelerate model_name=gpt2 "leaderboard|truthfulqa:mc|0"
+ lighteval accelerate model_name=gpt2 truthfulqa:mc
Example of commands that would not work:
lighteval accelerate model_name=gpt2 gsm8k
❌ # ValueError: More than one suite available for task gsm8k: ['leaderboard', 'lighteval']
lighteval accelerate model_name=gpt2 "leaderboard|gsm8k"
✅
Caveat: at this time this is imperfect due to multimodal and community suites being opt-in. Specifying only the taskname of a task in those suites will result in the error that was here before.
The extended suites would have been there as well, but as seen with @NathanHB, we chose to remove the protection for extended tasks at this time; in case this remains useful, I'm happy to revert that part of the code.