ragas icon indicating copy to clipboard operation
ragas copied to clipboard

ValueError: The metric [context_recall] that that is used requires the following additional columns ['reference'] to be present in the dataset.

Open mariamas-rakbank opened this issue 1 year ago • 12 comments

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question

I am getting this error while executing ValueError: The metric [context_recall] that that is used requires the following additional columns ['reference'] to be present in the dataset.

Code Examples from ragas import evaluate

result = evaluate( question_dataset, metrics=metrics, )

result

Additional context Anything else you want to share with us?

mariamas-rakbank avatar Oct 11 '24 09:10 mariamas-rakbank

Hey @mariamas-rakbank! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this together!

To resolve the ValueError related to the 'context_recall' metric, ensure your dataset includes a column named 'reference'. This column is necessary for the LLMContextRecall metric, which requires it for the SINGLE_TURN metric type [1].

The 'reference' column should contain text strings with claims to be compared against the retrieved contexts, like:

reference="The Eiffel Tower is located in Paris."

Make sure your dataset includes 'reference' along with 'user_input' and 'retrieved_contexts' [2].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 11 '24 09:10 dosubot[bot]

@dosu

in this documentation. It is mentioned that we should have the following columns
features: ['question', 'ground_truth', 'answer', 'contexts'], n i have these available. Anyhow, i have renamed the column from contexts to references.

now I'm getting another error.

ValueError: The metric [faithfulness] that that is used requires the following additional columns ['retrieved_contexts'] to be present in the dataset.

mariamas-rakbank avatar Oct 11 '24 13:10 mariamas-rakbank

To resolve the ValueError related to the 'faithfulness' metric, ensure your dataset includes a column named 'retrieved_contexts'. This column should contain the context data needed for the faithfulness evaluation. Here's an example of how your dataset should be structured:

import pandas as pd

# Example dataset
data = {
    "question": ["What is the capital of France?"],
    "answer": ["The capital of France is Paris."],
    "retrieved_contexts": [["Paris is the capital city of France."]],
    "ground_truth": [["Paris is the capital city of France."]]
}

df = pd.DataFrame(data)

Make sure each entry in the 'retrieved_contexts' column is a list of strings. This setup is crucial for calculating the faithfulness score, which measures the factual consistency of the generated answer against the given context [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 11 '24 14:10 dosubot[bot]

@dosu so do i need 4 columns or 5 columns? Earlier you mentioned i should have reference column as well

Example dataset

data = { "question": ["What is the capital of France?"], "answer": ["The capital of France is Paris."], "retrieved_contexts": [["Paris is the capital city of France."]], "ground_truth": [["Paris is the capital city of France."]] }

mariamas-rakbank avatar Oct 11 '24 14:10 mariamas-rakbank

You need five columns in your dataset for the metrics in RAGAS. The required columns are:

  1. user_input (equivalent to your question)
  2. retrieved_contexts
  3. reference (equivalent to your ground_truth)
  4. response (equivalent to your answer)
  5. reference_contexts (if needed for non-LLM context precision with reference) [1][2].

Make sure your dataset includes these columns to avoid errors.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 11 '24 14:10 dosubot[bot]

this is still not working. I am getting the following error now.

ValidationError: 1 validation error for SingleTurnSample retrieved_contexts value is not a valid list (type=type_error.list)

mariamas-rakbank avatar Oct 11 '24 15:10 mariamas-rakbank

Ensure that the retrieved_contexts column in your dataset is a list of strings. Each entry should be a list, even if it contains only one element or is empty. Here's how you can structure it:

import pandas as pd

# Example dataset
data = {
    "question": ["What is the capital of France?"],
    "answer": ["The capital of France is Paris."],
    "retrieved_contexts": [["Paris is the capital city of France."]],  # Ensure this is a list
    "ground_truth": [["Paris is the capital city of France."]],
    "reference": ["Paris is the capital city of France."]
}

df = pd.DataFrame(data)

# Convert to list if not already
df['retrieved_contexts'] = df['retrieved_contexts'].apply(lambda x: x if isinstance(x, list) else [x])

Make sure that retrieved_contexts is always a list to avoid the ValidationError [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 11 '24 15:10 dosubot[bot]

Ensure that the reference field in your dataset is a string. The SingleTurnSample class expects reference to be of type str. Here's how you can check and correct it:

import pandas as pd

# Example dataset
data = {
    "question": ["What is the capital of France?"],
    "answer": ["The capital of France is Paris."],
    "retrieved_contexts": [["Paris is the capital city of France."]],
    "ground_truth": [["Paris is the capital city of France."]],
    "reference": ["Paris is the capital city of France."]  # Ensure this is a string
}

df = pd.DataFrame(data)

# Convert to string if not already
df['reference'] = df['reference'].apply(lambda x: str(x) if not isinstance(x, str) else x)

Make sure the reference column contains strings to avoid the ValidationError [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 11 '24 17:10 dosubot[bot]

the issue still persists @dosu

mariamas-rakbank avatar Oct 11 '24 17:10 mariamas-rakbank

Ensure that the reference field in your dataset is a string and not a list or any other type. Double-check your dataset to confirm that each entry in the reference column is a plain string. Here's a quick check:

import pandas as pd

# Example dataset
data = {
    "question": ["What is the capital of France?"],
    "answer": ["The capital of France is Paris."],
    "retrieved_contexts": [["Paris is the capital city of France."]],
    "ground_truth": [["Paris is the capital city of France."]],
    "reference": ["Paris is the capital city of France."]  # Ensure this is a string
}

df = pd.DataFrame(data)

# Convert to string if not already
df['reference'] = df['reference'].apply(lambda x: str(x) if not isinstance(x, str) else x)

If the issue persists, verify that there are no hidden characters or formatting issues in the reference column.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 11 '24 17:10 dosubot[bot]

same question.

And it only occurs when I use OpenAI. When I choose other models such as Zhipuai, it will not appear. The version of my RAGAS is 0.1.20.

I found another issue mentioned this error and it said that “ContextPrecision required reference (v0.2 naming)/ground_truth(v0.1 naming) to be present in the dataset” and this error is actually a bug in 0.017. image https://github.com/explodinggradients/ragas/issues/1299

However as I mentioned my version of RAGAS is 0.1.20 and this problem still exists.

Qinlilseven avatar Oct 12 '24 06:10 Qinlilseven

so did it work for you @Qinlilseven ?

mariamas-rakbank avatar Oct 12 '24 11:10 mariamas-rakbank