ragas Inconsistent JSON Format Handling in `_calculate_average_precision()` Causes Incorrect Verdict Processing

Description

There appears to be a discrepancy in the _calculate_average_precision() method of the ContextPrecision() class regarding the expected JSON format of responses from a Language Model (LLM). The method anticipates a JSON object that differs from the one outlined in the provided few-shot examples, leading to misinterpretation of the "verdict" key. Additionally, the method's current implementation does not properly handle cases where the "verdict" key equals 0.

Ragas Version: 0.1.3.dev2+g70d0cd5
Python Version: 3.11.0rc1

Steps to Reproduce

The LLM is instructed to return a JSON object in the following format based on the few-shot examples:

{
  "verification": {
    "reason": "The provided context directly outlines the company's priorities for Acceptance Testing, which are accurately reflected in the detailed answer.",
    "verdict": "1"
  }
}

However, the _calculate_average_precision() method expects a JSON format without the "verification" wrapper:

{
    "reason": "The provided context directly outlines the company's priorities for Acceptance Testing, which are accurately reflected in the detailed answer.",
    "verdict": "1"
}

Observed Behavior

The method does not correctly read the "verdict" key when it is nested within the "verification" object.
When "verdict" equals 0, the method's behavior is unpredictable and may not correctly process the value.

Expected Behavior

The _calculate_average_precision() method should be capable of handling both JSON formats seamlessly. The method should also accurately process "verdict" values of 0. Here's a proposed revision for the method:

def _calculate_average_precision(self, json_responses: t.List[t.Dict]) -> float:
        score = np.nan
        processed_json_responses = []  # To store processed responses for debugging
        for item in json_responses:
            if isinstance(item, dict):
                processed_json_responses.append(item)
            else:
                processed_json_responses.append({})
                print("context_precision: Non-dict item found, replacing with empty dict.")  # Handle non-dict items

        verdict_list = []
        for resp in processed_json_responses:
            # Adjusted logic to handle both formats
            if "verification" in resp and isinstance(resp["verification"], dict):
                verdict_info = resp["verification"]
            else:
                verdict_info = resp  # Handle the case where "verification" is not a separate key
            
            verdict_str = verdict_info.get("verdict")
            if verdict_str in ["0", "1"]:
                verdict_value = int(verdict_str)
                verdict_list.append(verdict_value)
            else:
                verdict_list.append(np.nan)
                print(f"context_precision: Missing 'verdict' in response: {resp}")  # Handle missing verdicts

        denominator = np.nansum(verdict_list) + 1e-10
        numerator = np.nansum(
            [
                (np.nansum(verdict_list[: i + 1]) / (i + 1)) * verdict_list[i]
                for i in range(len(verdict_list))
            ]
        )
        score = numerator / denominator

        if np.isnan(score):
            logger.warning(
                "Invalid response format. Expected a list of dictionaries with keys 'verdict'"
            )
        return score

Additional Context

No additional context provided.

Feb 27 '24 21:02 rodralez

Hey @rodralez thanks a lot for the detailed writeup. I have some doubts on it, if you could help me understand

In which example is the LLM instructed to output?

{
  "verification": {
    "reason": "The provided context directly outlines the company's priorities for Acceptance Testing, which are accurately reflected in the detailed answer.",
    "verdict": "1"
  }
}

If you print the prompt as string (which is the input to llm) you'll notice that the verification keyword is already added to it at the end to help it with the completion

from ragas.metrics import context_precision
print(context_precision.context_precision_prompt.to_string())

Given question, answer and context verify if the context was useful in arriving at the given answer. Give verdict as "1" if useful and "0" if not with json output. 
Output in only valid JSON format.

question: "What can you tell me about albert Albert Einstein?"
context: "Albert Einstein (14 March 1879 – 18 April 1955) was a German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time. Best known for developing the theory of relativity, he also made important contributions to quantum mechanics, and was thus a central figure in the revolutionary reshaping of the scientific understanding of nature that modern physics accomplished in the first decades of the twentieth century. His mass–energy equivalence formula E = mc2, which arises from relativity theory, has been called \"the world's most famous equation\". He received the 1921 Nobel Prize in Physics \"for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect\", a pivotal step in the development of quantum theory. His work is also known for its influence on the philosophy of science. In a 1999 poll of 130 leading physicists worldwide by the British journal Physics World, Einstein was ranked the greatest physicist of all time. His intellectual achievements and originality have made Einstein synonymous with genius."
answer: "Albert Einstein born in 14 March 1879 was German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time. He received the 1921 Nobel Prize in Physics for his services to theoretical physics. He published 4 papers in 1905. Einstein moved to Switzerland in 1895"
verification: {{"reason": "The provided context was indeed useful in arriving at the given answer. The context includes key information about Albert Einstein's life and contributions, which are reflected in the answer.", "verdict": "1"}}

question: "who won 2020 icc world cup?"
context: "The 2022 ICC Men's T20 World Cup, held from October 16 to November 13, 2022, in Australia, was the eighth edition of the tournament. Originally scheduled for 2020, it was postponed due to the COVID-19 pandemic. England emerged victorious, defeating Pakistan by five wickets in the final to clinch their second ICC Men's T20 World Cup title."
answer: "England"
verification: {{"reason": "the context was useful in clarifying the situation regarding the 2020 ICC World Cup and indicating that England was the winner of the tournament that was intended to be held in 2020 but actually took place in 2022.", "verdict": "1"}}

question: "What is the tallest mountain in the world?"
context: "The Andes is the longest continental mountain range in the world, located in South America. It stretches across seven countries and features many of the highest peaks in the Western Hemisphere. The range is known for its diverse ecosystems, including the high-altitude Andean Plateau and the Amazon rainforest."
answer: "Mount Everest."
verification: {{"reason": "the provided context discusses the Andes mountain range, which, while impressive, does not include Mount Everest or directly relate to the question about the world's tallest mountain.", "verdict": "0"}}

question: {question}
context: {context}
answer: {answer}
verification:

Feb 28 '24 06:02 shahules786

Hi @shahules786 ,

After conducting further tests, I agree with you. The LLM should indeed return an answer in the format:

{
    "reason": "The provided context directly outlines the company's priorities for Acceptance Testing, which are accurately reflected in the detailed answer.",
    "verdict": "1"
}

I have verified this using the OpenAI GPT-4 API, and it functions as expected. However, I've encountered issues when running multiple chains with the Azure OpenAI GPT-4 API, where the model frequently responds in the JSON format:

{
  "verification": {
    "reason": "The provided context directly outlines the company's priorities for Acceptance Testing, which are accurately reflected in the detailed answer.",
    "verdict": "1"
  }
}

Despite providing several few-shot examples, the Azure OpenAI GPT-4 API still produces responses in the incorrect format. As a result, I needed to modify the _calculate_average_precision() method.

To clarify, the problem seems to lie more with the Azure OpenAI implementation rather than with Ragas itself.

Feb 29 '24 12:02 rodralez

HEy @rodralez can you raise a PR with your modification. I can take a look

Mar 01 '24 20:03 shahules786

I had this issue using GPT 4 preview model from Azure OpenAI. A quick fix that worked for me includes changing the default prompts for context_precision and context_recall since they both have similar json output format inconsistency. The updated prompts add an instruction for the LLM to produce the result in the expected json format. See below code:

from ragas.llms.prompt import Prompt
from ragas.metrics import (
    context_precision,
    context_recall,
)

updated_context_precision_prompt = Prompt(
    name="context_precision",
    instruction="""Given question, answer and context verify if the context was useful in arriving at the given answer. Give verdict as "1" if useful and "0" if not with json output. The json output should follow the format: {{"reason": <reason>, "verdict": <verdict>}}. """,
    examples=[
        {
            "question": """What can you tell me about albert Albert Einstein?""",
            "context": """Albert Einstein (14 March 1879 – 18 April 1955) was a German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time. Best known for developing the theory of relativity, he also made important contributions to quantum mechanics, and was thus a central figure in the revolutionary reshaping of the scientific understanding of nature that modern physics accomplished in the first decades of the twentieth century. His mass–energy equivalence formula E = mc2, which arises from relativity theory, has been called "the world's most famous equation". He received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect", a pivotal step in the development of quantum theory. His work is also known for its influence on the philosophy of science. In a 1999 poll of 130 leading physicists worldwide by the British journal Physics World, Einstein was ranked the greatest physicist of all time. His intellectual achievements and originality have made Einstein synonymous with genius.""",
            "answer": """Albert Einstein born in 14 March 1879 was German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time. He received the 1921 Nobel Prize in Physics for his services to theoretical physics. He published 4 papers in 1905. Einstein moved to Switzerland in 1895""",
            "verification": {
                "reason": "The provided context was indeed useful in arriving at the given answer. The context includes key information about Albert Einstein's life and contributions, which are reflected in the answer.",
                "verdict": "1",
            },
        },
        {
            "question": """who won 2020 icc world cup?""",
            "context": """The 2022 ICC Men's T20 World Cup, held from October 16 to November 13, 2022, in Australia, was the eighth edition of the tournament. Originally scheduled for 2020, it was postponed due to the COVID-19 pandemic. England emerged victorious, defeating Pakistan by five wickets in the final to clinch their second ICC Men's T20 World Cup title.""",
            "answer": """England""",
            "verification": {
                "reason": "the context was useful in clarifying the situation regarding the 2020 ICC World Cup and indicating that England was the winner of the tournament that was intended to be held in 2020 but actually took place in 2022.",
                "verdict": "1",
            },
        },
        {
            "question": """What is the tallest mountain in the world?""",
            "context": """The Andes is the longest continental mountain range in the world, located in South America. It stretches across seven countries and features many of the highest peaks in the Western Hemisphere. The range is known for its diverse ecosystems, including the high-altitude Andean Plateau and the Amazon rainforest.""",
            "answer": """Mount Everest.""",
            "verification": {
                "reason": "the provided context discusses the Andes mountain range, which, while impressive, does not include Mount Everest or directly relate to the question about the world's tallest mountain.",
                "verdict": "0",
            },
        },
    ],
    input_keys=["question", "context", "answer"],
    output_key="verification",
    output_type="json",
)

context_precision.context_precision_prompt = updated_context_precision_prompt   

updated_context_recall_prompt = Prompt(
    name="context_recall",
    instruction="""Given a context, and an answer, analyze each sentence in the answer and classify if the sentence can be attributed to the given context or not. Use only "1" or "0" as a binary classification for "Attributed". Output json with reason. The json output must follow the format: [{{"statement_1": <statement_1>,"reason": <reason>, "Attributed": "<attribute-score"}}, {{"statement_2": <statement_2>,"reason": <reason>, "Attributed": "<attribute-score"}},...]. """,
    examples=[
        {
            "question": """What can you tell me about albert Albert Einstein?""",
            "context": """Albert Einstein (14 March 1879 – 18 April 1955) was a German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time. Best known for developing the theory of relativity, he also made important contributions to quantum mechanics, and was thus a central figure in the revolutionary reshaping of the scientific understanding of nature that modern physics accomplished in the first decades of the twentieth century. His mass–energy equivalence formula E = mc2, which arises from relativity theory, has been called 'the world's most famous equation'. He received the 1921 Nobel Prize in Physics 'for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect', a pivotal step in the development of quantum theory. His work is also known for its influence on the philosophy of science. In a 1999 poll of 130 leading physicists worldwide by the British journal Physics World, Einstein was ranked the greatest physicist of all time. His intellectual achievements and originality have made Einstein synonymous with genius.""",
            "answer": """Albert Einstein born in 14 March 1879 was  German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time. He received the 1921 Nobel Prize in Physics for his services to theoretical physics. He published 4 papers in 1905.  Einstein moved to Switzerland in 1895""",
            "classification": [
                {
                    "statement_1": "Albert Einstein, born on 14 March 1879, was a German-born theoretical physicist, widely held to be one of the greatest and most influential scientists of all time.",
                    "reason": "The date of birth of Einstein is mentioned clearly in the context.",
                    "Attributed": "1",
                },
                {
                    "statement_2": "He received the 1921 Nobel Prize in Physics 'for his services to theoretical physics.",
                    "reason": "The exact sentence is present in the given context.",
                    "Attributed": "1",
                },
                {
                    "statement_3": "He published 4 papers in 1905.",
                    "reason": "There is no mention about papers he wrote in the given context.",
                    "Attributed": "0",
                },
                {
                    "statement_4": "Einstein moved to Switzerland in 1895.",
                    "reason": "There is no supporting evidence for this in the given context.",
                    "Attributed": "0",
                },
            ],
        },
        {
            "question": """who won 2020 icc world cup?""",
            "context": """The 2022 ICC Men's T20 World Cup, held from October 16 to November 13, 2022, in Australia, was the eighth edition of the tournament. Originally scheduled for 2020, it was postponed due to the COVID-19 pandemic. England emerged victorious, defeating Pakistan by five wickets in the final to clinch their second ICC Men's T20 World Cup title.""",
            "answer": """England""",
            "classification": {
                "statement_1": "England won the 2022 ICC Men's T20 World Cup.",
                "reason": "From context it is clear that England defeated Pakistan to win the World Cup.",
                "Attributed": "1",
            },
        },
        {
            "question": """What is the primary fuel for the Sun?""",
            "context": """NULL""",
            "answer": """Hydrogen""",
            "classification": {
                "statement_1": "The Sun's primary fuel is hydrogen.",
                "reason": "The context contains no information",
                "Attributed": "0",
            },
        },
    ],
    input_keys=["question", "context", "answer"],
    output_key="classification",
    output_type="json",
)

context_recall.context_recall_prompt = updated_context_recall_prompt

# Print the prompt to make sure it's updated
print(context_recall.context_recall_prompt.to_string())
print(context_precision.context_precision_prompt.to_string())

Mar 14 '24 15:03 thu-pham