[RFC] Rework the suites to be inferred

Open LysandreJik opened this issue 7 months ago • 0 comments

This PR proposes a change in the way tasks and suites are managed. It's a proof of concept that isn't extensively tested at this point, in order to check whether this would make sense to pursue.

This PR's objective is twofold:

num_fewshots is now inferred to 0 by default if not specified.
suite is now automatically selected when a single suite contains the task requested. If the task is handled by more than one suite, an error is thrown.

Example of commands that would work:

- lighteval accelerate model_name=gpt2 "extended|mt_bench|0"
+ lighteval accelerate model_name=gpt2 mt_bench

- lighteval accelerate model_name=gpt2 "leaderboard|truthfulqa:mc|0"
+ lighteval accelerate model_name=gpt2 truthfulqa:mc

Example of commands that would not work:

lighteval accelerate model_name=gpt2 gsm8k  
❌ # ValueError: More than one suite available for task gsm8k: ['leaderboard', 'lighteval']

lighteval accelerate model_name=gpt2 "leaderboard|gsm8k"
✅

Caveat: at this time this is imperfect due to multimodal and community suites being opt-in. Specifying only the taskname of a task in those suites will result in the error that was here before.

The extended suites would have been there as well, but as seen with @NathanHB, we chose to remove the protection for extended tasks at this time; in case this remains useful, I'm happy to revert that part of the code.

Sep 09 '25 15:09 LysandreJik