llama-stack
llama-stack copied to clipboard
[rag evals][3/n] add ability to eval retrieval + generation generation in agentic eval pipeline
What does this PR do?
- This PR adds the ability s.t. users can evaluate on both retrieval + generation separately & as a whole by passing an AgentConfig to the /eval API
- The memory_retrieval context will be stored in the "context" column used for scoring functions that can evaluate the retrieved context.
Test Plan
- E2E Test RAG Agent Notebook: https://gist.github.com/yanxi0830/0377594d29958f9b6f9317ab049fa836
Sources
Please link relevant resources if necessary.
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the contributor guideline, Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.