llama-stack
llama-stack copied to clipboard

Published 20 hours ago •

Reame
Issues

[rag evals][3/n] add ability to eval retrieval + generation generation in agentic eval pipeline

Open yanxi0830 opened this issue 10 months ago • 0 comments

What does this PR do?

This PR adds the ability s.t. users can evaluate on both retrieval + generation separately & as a whole by passing an AgentConfig to the /eval API
The memory_retrieval context will be stored in the "context" column used for scoring functions that can evaluate the retrieved context.

Test Plan

E2E Test RAG Agent Notebook: https://gist.github.com/yanxi0830/0377594d29958f9b6f9317ab049fa836

Sources

Please link relevant resources if necessary.

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Ran pre-commit to handle lint / formatting issues.
[ ] Read the contributor guideline, Pull Request section?
[ ] Updated relevant documentation.
[ ] Wrote necessary unit or integration tests.

Dec 20 '24 04:12 yanxi0830