ragas
ragas copied to clipboard
RAGAS metrics vs Traditional Metrics
Hi!
I would like to know if RAGAS metrics are as reliable as with traditional metrics, especially compared to retrieval metrics such as precision@k, recall@k, MRR, nDCG... Are there any studies/papers about LLMs as a judge that ensure that?
If you had the possibility to choose between LLMs as a judge and traditional metrics, what would you choose? (In the case you want to evaluate a RAG pipeline)
Thanks