llama-stack
llama-stack copied to clipboard

Published 20 hours ago •

Reame
Issues

[Evals API][8/n] AnswerParsingScoringFn for MMLU

Open yanxi0830 opened this issue 1 year ago • 0 comments

--continuation of https://github.com/meta-llama/llama-stack/pull/333

TL;DR

Introduce a registerable AnswerParsingScoringFn with AnswerParsingScoringContext for registering scoring functions with context
Remove parameters field (context alone is sufficient for things related to scoring context)
AnswerParsingScoringFn is able to run all benchmarks with multiple choice tasks (see apps PR).

Test

PROVIDER_ID=test-meta PROVIDER_CONFIG=llama_stack/providers/tests/scoring/provider_config_example.yaml pytest -s llama_stack/providers/tests/scoring/test_scoring.py --tb=short --disable-warnings

App

See App: https://github.com/meta-llama/llama-stack-apps/pull/105

Oct 31 '24 23:10 yanxi0830