ragas icon indicating copy to clipboard operation
ragas copied to clipboard

About RagChecker

Open binghangli378 opened this issue 1 year ago • 4 comments

Recently, I have noticed something similar to ragas, which is the RagChecker. It provides a new perspective to evaluate RAG pipelines which separately focuses on:

  • Relevant chunk
  • Irrelevant chunk used by the model
  • Irrelevant chunk ignored by the model

This new perspective will provide a more detailed evaluation of the model’s performance, allowing for a deeper understanding of how different types of data chunks impact the evaluation process.

I am willing to design some new evaluation methods based on it. Please let me know if you are open to this idea, and I can provide further assistance or code examples.

binghangli378 avatar Jul 05 '24 09:07 binghangli378

@binghangli378 that is a very interesting Idea, reopening this to track more. Would you still like to help out on this?

@shahules786 something we can consider for #1010 ?

jjmachan avatar Aug 02 '24 06:08 jjmachan

@binghangli378 Yes, this is very interesting. From what I observed they have a few more metrics that are not available in Ragas (note I just added #1174), I think the two metrics that would be beneficial are

  1. self-knowledge: this would be something like a 1 - faithfulness score. Uses to measure how much of the generated response contains knowledge from LLM itself.
  2. noise sensitivity: this is more interesting, I think what they are trying to achieve is
number of incorrect claims in the generated answer that came from irreverent chunks / total number of claims in the answer

This could be used to understand how bad noise in the context is affecting the quality of the generated answer. I also found this paper showing noise in retrieved-context effects answer quality.

tagging you guys in case if you're interested in contributing. I have added them to the metrics roadmap. @sky-2002 @vaishakhRaveendran

shahules786 avatar Aug 06 '24 05:08 shahules786

I can take up noise-sensitivity In fact, we had discussed something similar what I was referring to as attributing each claim in answer to some context.

sky-2002 avatar Aug 10 '24 09:08 sky-2002

@sky-2002 Sure, Can you please comment in this issue so that I can assign it to you?

shahules786 avatar Aug 10 '24 14:08 shahules786