Can you provide human assessment data mentioned in RAGAS paper?

Open awsvmaringa opened this issue 1 year ago • 2 comments

Describe the Feature Can you could provide the human assessment data collected for bechmarking RAGAS metrics against human evaluations in your paper?

Why is the feature important for you? The paper only benchmarks ChatGPT against human evaluation. This feature would establish a standard dataset for benchmarking any LLM-as-judge models against human evaluation.

Additional context It would be great if you could provide a standard dataset containing question, ground truth, context, human labels for benchmarking all RAGAS metrics for different judge models.

Jul 02 '24 23:07 awsvmaringa

@awsvmaringa sorry for the delay but are you still looking for it?

Aug 08 '24 04:08 jjmachan

@jjmachan I also need it. Could you please provide the data?

Mar 16 '25 06:03 leekum2018