ValueError: The metric [context_recall] that that is used requires the following additional columns ['reference'] to be present in the dataset.
[ ] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question
I am getting this error while executing ValueError: The metric [context_recall] that that is used requires the following additional columns ['reference'] to be present in the dataset.
Code Examples from ragas import evaluate
result = evaluate( question_dataset, metrics=metrics, )
result
Additional context Anything else you want to share with us?
Hey @mariamas-rakbank! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this together!
To resolve the ValueError related to the 'context_recall' metric, ensure your dataset includes a column named 'reference'. This column is necessary for the LLMContextRecall metric, which requires it for the SINGLE_TURN metric type [1].
The 'reference' column should contain text strings with claims to be compared against the retrieved contexts, like:
reference="The Eiffel Tower is located in Paris."
Make sure your dataset includes 'reference' along with 'user_input' and 'retrieved_contexts' [2].
To continue talking to Dosu, mention @dosu.
@dosu
in this documentation. It is mentioned that we should have the following columns
features: ['question', 'ground_truth', 'answer', 'contexts'],
n i have these available.
Anyhow, i have renamed the column from contexts to references.
now I'm getting another error.
ValueError: The metric [faithfulness] that that is used requires the following additional columns ['retrieved_contexts'] to be present in the dataset.
To resolve the ValueError related to the 'faithfulness' metric, ensure your dataset includes a column named 'retrieved_contexts'. This column should contain the context data needed for the faithfulness evaluation. Here's an example of how your dataset should be structured:
import pandas as pd
# Example dataset
data = {
"question": ["What is the capital of France?"],
"answer": ["The capital of France is Paris."],
"retrieved_contexts": [["Paris is the capital city of France."]],
"ground_truth": [["Paris is the capital city of France."]]
}
df = pd.DataFrame(data)
Make sure each entry in the 'retrieved_contexts' column is a list of strings. This setup is crucial for calculating the faithfulness score, which measures the factual consistency of the generated answer against the given context [1].
To continue talking to Dosu, mention @dosu.
@dosu so do i need 4 columns or 5 columns? Earlier you mentioned i should have reference column as well
Example dataset
data = { "question": ["What is the capital of France?"], "answer": ["The capital of France is Paris."], "retrieved_contexts": [["Paris is the capital city of France."]], "ground_truth": [["Paris is the capital city of France."]] }
You need five columns in your dataset for the metrics in RAGAS. The required columns are:
user_input(equivalent to yourquestion)retrieved_contextsreference(equivalent to yourground_truth)response(equivalent to youranswer)reference_contexts(if needed for non-LLM context precision with reference) [1][2].
Make sure your dataset includes these columns to avoid errors.
To continue talking to Dosu, mention @dosu.
this is still not working. I am getting the following error now.
ValidationError: 1 validation error for SingleTurnSample retrieved_contexts value is not a valid list (type=type_error.list)
Ensure that the retrieved_contexts column in your dataset is a list of strings. Each entry should be a list, even if it contains only one element or is empty. Here's how you can structure it:
import pandas as pd
# Example dataset
data = {
"question": ["What is the capital of France?"],
"answer": ["The capital of France is Paris."],
"retrieved_contexts": [["Paris is the capital city of France."]], # Ensure this is a list
"ground_truth": [["Paris is the capital city of France."]],
"reference": ["Paris is the capital city of France."]
}
df = pd.DataFrame(data)
# Convert to list if not already
df['retrieved_contexts'] = df['retrieved_contexts'].apply(lambda x: x if isinstance(x, list) else [x])
Make sure that retrieved_contexts is always a list to avoid the ValidationError [1].
To continue talking to Dosu, mention @dosu.
Ensure that the reference field in your dataset is a string. The SingleTurnSample class expects reference to be of type str. Here's how you can check and correct it:
import pandas as pd
# Example dataset
data = {
"question": ["What is the capital of France?"],
"answer": ["The capital of France is Paris."],
"retrieved_contexts": [["Paris is the capital city of France."]],
"ground_truth": [["Paris is the capital city of France."]],
"reference": ["Paris is the capital city of France."] # Ensure this is a string
}
df = pd.DataFrame(data)
# Convert to string if not already
df['reference'] = df['reference'].apply(lambda x: str(x) if not isinstance(x, str) else x)
Make sure the reference column contains strings to avoid the ValidationError [1].
To continue talking to Dosu, mention @dosu.
the issue still persists @dosu
Ensure that the reference field in your dataset is a string and not a list or any other type. Double-check your dataset to confirm that each entry in the reference column is a plain string. Here's a quick check:
import pandas as pd
# Example dataset
data = {
"question": ["What is the capital of France?"],
"answer": ["The capital of France is Paris."],
"retrieved_contexts": [["Paris is the capital city of France."]],
"ground_truth": [["Paris is the capital city of France."]],
"reference": ["Paris is the capital city of France."] # Ensure this is a string
}
df = pd.DataFrame(data)
# Convert to string if not already
df['reference'] = df['reference'].apply(lambda x: str(x) if not isinstance(x, str) else x)
If the issue persists, verify that there are no hidden characters or formatting issues in the reference column.
To continue talking to Dosu, mention @dosu.
same question.
And it only occurs when I use OpenAI. When I choose other models such as Zhipuai, it will not appear. The version of my RAGAS is 0.1.20.
I found another issue mentioned this error and it said that “ContextPrecision required reference (v0.2 naming)/ground_truth(v0.1 naming) to be present in the dataset” and this error is actually a bug in 0.017.
https://github.com/explodinggradients/ragas/issues/1299
However as I mentioned my version of RAGAS is 0.1.20 and this problem still exists.
so did it work for you @Qinlilseven ?