llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

[rag evals][3/n] add ability to eval retrieval + generation generation in agentic eval pipeline

Open yanxi0830 opened this issue 10 months ago • 0 comments

What does this PR do?

  • This PR adds the ability s.t. users can evaluate on both retrieval + generation separately & as a whole by passing an AgentConfig to the /eval API
  • The memory_retrieval context will be stored in the "context" column used for scoring functions that can evaluate the retrieved context.

Test Plan

  • E2E Test RAG Agent Notebook: https://gist.github.com/yanxi0830/0377594d29958f9b6f9317ab049fa836
image image

Sources

Please link relevant resources if necessary.

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ ] Ran pre-commit to handle lint / formatting issues.
  • [ ] Read the contributor guideline, Pull Request section?
  • [ ] Updated relevant documentation.
  • [ ] Wrote necessary unit or integration tests.

yanxi0830 avatar Dec 20 '24 04:12 yanxi0830