Evaluation Dataset - Ground truth Warnings
Hi,
I would like to raise a potential error around the dataset used in the evaluation method (from evaluation.py).
I got the evaluation method working when I set the following dataset:
Dataset.from_dict(
{
'question': list,
'contexts': list[list],
'ground_truths': list[list],
'answer': list
}
)
However, I was receiving the following warning every time the pipeline ran:
passing column names as 'ground_truths' is deprecated and will be removed in the next version, please use 'ground_truth' instead. Note that `ground_truth` should be of type string and not Sequence[string] like `ground_truths`
Comparing to the documentation present here and in the evaluation docstrings here, ground_truth key is expected to be a list[list].
Therefore, is this warning correct? If so, to generate a string, we simply concatenate the list of strings into a single string? Separating by space, comma?
Then, on a quick experimentation, I changed the dataset key from ground_truths to ground_truth, but passing still list[list], it did not work. This makes the example given here invalid, I guess.
Thanks! I happy to provide any additional info!
I am using: ragas==0.1.0
seems like the documentation has to be updated - thanks for pointing it out .
the correct format is
Dataset.from_dict(
{
'question': list,
'contexts': list[list],
'ground_truth': list,
'answer': list
}
)