torchmetrics
torchmetrics copied to clipboard
Need higher level RAG metrics
Problem & Motivation
There is a huge wave of interest around high accuracy Q&A, such as via Retrieval Augmented Generation (RAG). RAG accuracy is largely driven by how well vector search is able to retrieve the correct context to answer questions via an LLM. When evaluating embedding models, vector search retrieval metrics are helpful but insufficient because they don't reveal how well the retrieved content actually answers the target questions.
Pitch
I'd love to see an integration with a tool like our new ragulate library (Apache 2 licensed) that would simplify model evaluation on RAG Q&A: https://github.com/epinzur/ragulate/tree/main
Additional context
I was going to suggest that you integrate with trulens, but then I discovered that we built ragulate to automate much of the process of using trulens, and we'd love feedback on it.