llama-stack
llama-stack copied to clipboard
[Evals API][3/n] scoring_functions / scoring meta-reference implementations
---continuation of https://github.com/meta-llama/llama-stack/pull/288
TL;DR
scoringdepends ondatasetio&datasets- Enforce dataset_schema, client enforces dataset schema to follow specified column names/types
- Add clients for datasets / datasetio / scoring
- Basic meta-reference impl for scoring/score, scoring/score_batch
Next PR
- full evals benchmark task with generation
- LLM as judge scorer impl
Tests
unit tests
PROVIDER_ID=test-meta PROVIDER_CONFIG=llama_stack/providers/tests/scoring/provider_config_example.yaml pytest -s llama_stack/providers/tests/scoring/test_scoring.py --tb=short --disable-warnings
PROVIDER_ID=test-meta PROVIDER_CONFIG=llama_stack/providers/tests/datasetio/provider_config_example.yaml pytest -s llama_stack/providers/tests/datasetio/test_datasetio.py --tb=short --disable-warnings
client tests
python -m llama_stack.apis.datasetio.client $LOCALHOST 5000
python -m llama_stack.apis.datasets.client $LOCALHOST 5000
python -m llama_stack.apis.scoring.client $LOCALHOST 5000