lighteval
lighteval copied to clipboard
Serbian LLM Benchmark Task
Serbian LLM Benchmark Task Configuration and Prompt Functions
Summary:
This pull request introduces task configurations and prompt functions for evaluating LLM models on various Serbian datasets. The module includes tasks for:
ARC (easy and challenge), BoolQ, Hellaswag, OpenBookQA, PIQA, Winogrande, custom OZ Eval dataset.
The tasks are defined using the LightevalTaskConfig
class, and prompt generation is streamlined through a reusable serbian_eval_prompt
function.
Changes:
-
Task Configurations:
- Configurations for ARC (Easy and Challenge), BoolQ, Hellaswag, OpenBookQA, PIQA, Winogrande, and OZ Eval tasks using
LightevalTaskConfig
. - Enum class
HFSubsets
added for dataset subset management, improving code maintainability and clarity. -
create_task_config
function allows dynamic task creation with dependency injection for flexibility in dataset and metric selection.
- Configurations for ARC (Easy and Challenge), BoolQ, Hellaswag, OpenBookQA, PIQA, Winogrande, and OZ Eval tasks using
-
Prompt Functions:
- The
serbian_eval_prompt
function creates a structured multiple-choice prompt in Serbian. - The function supports dynamic query and choice generation with configurable tasks.
- The
-
Logging:
- A
hello_message
banner is printed upon task initialization, listing all available tasks. - Task names are dynamically generated and printed using
hlog_warn
.
- A
Key Features:
- Modular Design: Task configurations are modular, reusable, and easily extendable to accommodate new datasets and tasks.
-
Improved Readability: Introduction of the
HFSubsets
Enum improves the readability and maintainability of the dataset subset references. -
Enhanced Flexibility:
create_task_config
function simplifies task creation, promoting cleaner and more maintainable code. - Clear Logging: Logging includes a friendly welcome message and a list of available tasks for easier debugging and interaction.
Future Enhancements:
- Additional prompt functions can be added for different task types.
- Unit tests should be written to ensure the integrity of prompt generation and task configuration.