ragas icon indicating copy to clipboard operation
ragas copied to clipboard

Implementation of Noise sensitivity metrics from RAGChecker

Open sahusiddharth opened this issue 1 year ago • 12 comments

Solves:

  • #1185

  • Took inspiration from RAGChecker from AWS Noise sensitivity noise sensitivity metrics.

  • Have tested it locally, it is working giving the results.

Input

from datasets import Dataset 
from ragas.metrics import noise_sensitivity_relevant, noise_sensitivity_irrelevant
from ragas import evaluate
data_sample = {
    "question": ["What is the Life Insurance Corporation of India (LIC) known for?"],
    "ground_truth": ["The Life Insurance Corporation of India (LIC) is the largest insurance company in India, established in 1956 through the nationalization of the insurance industry. It is known for managing a large portfolio of investments."],
    "answer": ["The Life Insurance Corporation of India (LIC) is the largest insurance company in India, known for its vast portfolio of investments. LIC contributs to the financial stability of the country."],
    "contexts": [["The Life Insurance Corporation of India (LIC) was established in 1956 following the nationalization of the insurance industry in India.",
        "LIC is the largest insurance company in India, with a vast network of policyholders and a huge investments.",
        "As the largest institutional investor in India, LIC manages a substantial funds, contributing to the financial stability of the country.",
        "The Indian economy is one of the fastest-growing major economies in the world, thanks to the secors like finance, technology, manufacturing etc"]]
}


dataset = Dataset.from_dict(data_sample)
metrics = [noise_sensitivity_relevant, noise_sensitivity_irrelevant]
score = evaluate(dataset,metrics=metrics)
score.to_pandas()

sahusiddharth avatar Aug 12 '24 18:08 sahusiddharth

Could you please provide some suggestions on the types of documentation that might be required? Your input would be greatly appreciated.

sahusiddharth avatar Aug 13 '24 14:08 sahusiddharth

Hey @sahusiddharth I did see this doc written by one of author of ragchecker, I have also emailed him asking for an intuitive explanation of noise sensitivity (let's wait for few hours and see if he replies). Otherwise, it would be nice to follow how the format in metrics as here https://docs.ragas.io/en/stable/concepts/metrics/context_recall.html

shahules786 avatar Aug 13 '24 17:08 shahules786

Hi @shahules786,

Got it. Please let me know when you hear back from the author. In the meantime, I’ll check the metrics format you mentioned.

Thanks!

sahusiddharth avatar Aug 13 '24 17:08 sahusiddharth

Hi @shahules786,

I wanted to clarify something regarding the noise sensitivity implementation. There are two types: one for when relevant context is retrieved and another for when irrelevant context is retrieved. The current implementation only addresses the relevant context.

Would you prefer that I add the handling for irrelevant context first, or should I complete the documentation for the basic implementation before proceeding with the additional functionality?

sahusiddharth avatar Aug 14 '24 09:08 sahusiddharth

@sahusiddharth I did notice that, thought that using relevant might be more useful but now I think if the user has the ability to switch b/w both using an argument that would be better. Can you modify it to include that behavior? Then we can add documentation for both in same page.

shahules786 avatar Aug 14 '24 10:08 shahules786

I’m happy to make the necessary modifications. I would appreciate some additional guidance on how best to return the results when asked for both. Since the output could be returned in multiple types of data, such as dictionaries, named tuples, or tuples, I’m considering the most appropriate format.

{'noise_sensitivity_relevant': 0.0, 'noise_sensitivity_irrelevant': 0.0}

I was thinking to return only the number when asked for a specific one.

Could you please advise on the preferred format for returning these results?

sahusiddharth avatar Aug 14 '24 10:08 sahusiddharth

@sahusiddharth Thanks for asking that. I think both should not be an option - it would be one of them 'relevant' or 'irrelevant'. By default, it should stay as 'relevant'. In upcoming versions, we will introduce caching so avoid llm recalculating the same intermediate results as in this case if someone wants both they have two make two calls.

Make sure that when you write the doc give credit to Ragchecker by citing the work.

shahules786 avatar Aug 14 '24 10:08 shahules786

@shahules786, I'm almost done with the documentation, to properly show the power of noise sensitivity, the example is getting long, Do you have a problem with that?

sahusiddharth avatar Aug 14 '24 13:08 sahusiddharth

@sahusiddharth We can refine it later , but I also show some basic examples here

shahules786 avatar Aug 14 '24 13:08 shahules786

@shahules786, Have gone through them, I didn't liked it that much the answers generated by llm is rarely using the information provided in the context and I didn't find it intuitive enough.

sahusiddharth avatar Aug 14 '24 13:08 sahusiddharth

@sahusiddharth I agree, their dataset generation is naive.

shahules786 avatar Aug 14 '24 14:08 shahules786

@shahules786, tried returning tuple when given we want noise sensitivity for both relevant and irrelevant the make-ci was giving me error.

ragas/src/ragas/metrics/_noise_sensitivity.py:208:15 - error: Method "_ascore" overrides class "Metric" in an incompatible manner
    Return type mismatch: base method returns type "Coroutine[Any, Any, float]", override returns type "Coroutine[Any, Any, Coroutine[Any, Any, float | Tuple[float, float]]]"
      "Coroutine[Any, Any, Coroutine[Any, Any, float | Tuple[float, float]]]" is incompatible with "Coroutine[Any, Any, float]"
        Type parameter "_ReturnT_co_nd@Coroutine" is covariant, but "Coroutine[Any, Any, float | Tuple[float, float]]" is not a subtype of "float"
          "Coroutine[Any, Any, float | Tuple[float, float]]" is incompatible with "float" (reportIncompatibleMethodOverride)
  /Users/nexus/Desktop/ankit/ragas/src/ragas/metrics/_noise_sensitivity.py:244:16 - error: Expression of type "Unknown | tuple[Unknown, Unknown]" is incompatible with return type "Coroutine[Any, Any, float | Tuple[float, float]]"
    Type "Unknown | tuple[Unknown, Unknown]" is incompatible with type "Coroutine[Any, Any, float | Tuple[float, float]]"
      "tuple[Unknown, Unknown]" is incompatible with "Coroutine[Any, Any, float | Tuple[float, float]]" (reportReturnType)

sahusiddharth avatar Aug 14 '24 18:08 sahusiddharth

Hi @sahusiddharth @shahules786 , thanks for your interest to RAGChecker. I'm the coauthor of RAGChecker. This is a nice work for integrating Noise Sensitivity into Ragas.

Regarding your comments on ground truth answer generation, I want to make some clarifications. We took the short answers and the annotated ground truth passages (the context) as input to generate long-form answers. And then, to ensure the answers are faithful to the context, we use RefChecker to fiter out answers that contain hallucinations. So the generated answers are always stick to the provided context.

Please refer to Appendix A.2 in RAGChecker paper. Thanks!

@shahules786, Have gone through them, I didn't liked it that much the answers generated by llm is rarely using the information provided in the context and I didn't find it intuitive enough.

HuXiangkun avatar Sep 29 '24 01:09 HuXiangkun

hey @HuXiangkun thanks a lot for clearing that up ❤️ would love to have you in our community and stay in touch if you would be interested 🙂

jjmachan avatar Sep 30 '24 16:09 jjmachan

Hi @jjmachan , I'm glad to stay in touch!

HuXiangkun avatar Oct 01 '24 01:10 HuXiangkun

just send you a mail 🙂

jjmachan avatar Oct 05 '24 18:10 jjmachan

Hi @sahusiddharth which ragas version you are using for the evaluation of noise sensitivity metrics?

nibeditaSw avatar Nov 20 '24 11:11 nibeditaSw