llama-stack
llama-stack copied to clipboard
[Evals API][8/n] AnswerParsingScoringFn for MMLU
--continuation of https://github.com/meta-llama/llama-stack/pull/333
TL;DR
- Introduce a registerable AnswerParsingScoringFn with AnswerParsingScoringContext for registering scoring functions with context
- Remove parameters field (context alone is sufficient for things related to scoring context)
- AnswerParsingScoringFn is able to run all benchmarks with multiple choice tasks (see apps PR).
Test
PROVIDER_ID=test-meta PROVIDER_CONFIG=llama_stack/providers/tests/scoring/provider_config_example.yaml pytest -s llama_stack/providers/tests/scoring/test_scoring.py --tb=short --disable-warnings
App
- See App: https://github.com/meta-llama/llama-stack-apps/pull/105