torchmetrics Need higher level RAG metrics

Need higher level RAG metrics

Open devinbost opened this issue 8 months ago • 2 comments

Problem & Motivation

There is a huge wave of interest around high accuracy Q&A, such as via Retrieval Augmented Generation (RAG). RAG accuracy is largely driven by how well vector search is able to retrieve the correct context to answer questions via an LLM. When evaluating embedding models, vector search retrieval metrics are helpful but insufficient because they don't reveal how well the retrieved content actually answers the target questions.

Pitch

I'd love to see an integration with a tool like our new ragulate library (Apache 2 licensed) that would simplify model evaluation on RAG Q&A: https://github.com/epinzur/ragulate/tree/main

Additional context

I was going to suggest that you integrate with trulens, but then I discovered that we built ragulate to automate much of the process of using trulens, and we'd love feedback on it.

Jun 14 '24 14:06 devinbost

torchmetrics torchmetrics copied to clipboard

Need higher level RAG metrics

Problem & Motivation

Pitch

Additional context

torchmetrics
torchmetrics copied to clipboard