Questions About Metrics Changes from Ragas v0.1 to v0.2
[V] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question I have compared the descriptions of metrics in the documentation for Ragas v0.1 and v0.2 and have some questions:
-
If all v0.1 metrics are still supported in v0.2, why are metrics like "Answer Relevance" and "Context Utilization" not mentioned in the v0.2 documentation? Is this an oversight in the documentation, or are these metrics no longer supported in v0.2?
-
Building on the first question, could it be that some metrics have simply been renamed? For example, is "Answer Relevance" in v0.1 equivalent to "Response Relevancy" in v0.2? If so, could you clarify which metrics have been renamed and what their new names are?
-
In v0.2, metrics are categorized under different groups (e.g., RAG, Nvidia, General Purpose). I understand this may be for better organization due to the expanded functionality, but have the definitions of any metrics changed as a result of this re-categorization?
-
I have found two different images explaining the metrics. Could you confirm which one is correct or up-to-date?
| Metrics | V0.1 | V0.2 |
|---|---|---|
| Faithfulness | V | V(RAG) |
| Answer Relevance | V | |
| Context Precision | V | V(RAG) |
| Context utilization | V | |
| Context Recall | V | V(RAG) |
| Context entities recall | V | V(RAG) |
| Noise Sensitivity | V | V(RAG) |
| Answer semantic similarity | V | |
| Answer Correctness | V | |
| Answer Critique | V | V(General Purpose) |
| Domain Specific Evaluation | V | |
| Summarization Score | V | |
| Response Relevancy | V(RAG) | |
| Answer Accuracy | V(Nvidia) | |
| Context Relevancy | V(Nvidia) | |
| Response Groundedness | V(Nvidia) | |
| Factual Correctness | V(Natual Language Comparison) | |
| Semantic Similarity | V(Natual Language Comparison) |
Thank you for your clarification!