RAG Evaluation Metrics and Recommended Thresholds

Open technqvi opened this issue 7 months ago • 0 comments

Are there any links to refer to recommended thresholds for each metric listed below for RAG evaluation?

I applied these metrics to the RAG project. These are the metrics for RAG Evaluation that range from scores of 0-1 (0%-100%): RAGAS metric 1.Correctness 2.Semantic Similarity 3.Faithfulness 4. Response Relevancy 5. LLM Context Recall

Nvidia Metrics: 1.Answer Accuracy 2.Context Relevance Responsto 3.Groundedness

For example, I used 4o-mini to synthesize 100 questions about company policy and evaluated them using 4o-mini, anthropic.claude-3-5-haiku, and gemini-2.0. The summary scores are shown in the table shown in below.

metric | mean_score

May 17 '25 09:05 technqvi