llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

[Evals API][9/n] SimpleQA evals

Open yanxi0830 opened this issue 1 year ago • 0 comments

--continuation of https://github.com/meta-llama/llama-stack/pull/352

TL;DR

  1. Implement OpenAI's SimpleQA's Benchmark as ScoringFn (reference)

[RFC]

  • Option 1: SimpleQAScoringFn: Move each benchmark eval into separate scoring function with it's own context.
  • Option 2 (current): Single LLMAsJudgeScoring for SimpleQA.

Test

  • Test client app in: https://github.com/meta-llama/llama-stack-apps/pull/105

yanxi0830 avatar Nov 01 '24 07:11 yanxi0830