ragas icon indicating copy to clipboard operation
ragas copied to clipboard

RAG Evaluation Metrics and Recommended Thresholds

Open technqvi opened this issue 7 months ago • 0 comments

Are there any links to refer to recommended thresholds for each metric listed below for RAG evaluation?

I applied these metrics to the RAG project. These are the metrics for RAG Evaluation that range from scores of 0-1 (0%-100%): RAGAS metric 1.Correctness 2.Semantic Similarity 3.Faithfulness 4. Response Relevancy 5. LLM Context Recall

Nvidia Metrics: 1.Answer Accuracy 2.Context Relevance Responsto 3.Groundedness

For example, I used 4o-mini to synthesize 100 questions about company policy and evaluated them using 4o-mini, anthropic.claude-3-5-haiku, and gemini-2.0. The summary scores are shown in the table shown in below.

metric | mean_score

nv_accuracy | 0.74 nv_context_relevance | 0.96 nv_response_groundedness | 0.98 answer_correctness | 0.65 semantic_similarity | 0.95 faithfulness | 0.93 answer_relevancy | 0.97 context_recall | 0.95 llm_context_precision_with_reference | 0.95 llm_context_precision_without_reference | 0.98

 

technqvi avatar May 17 '25 09:05 technqvi