ragas
ragas copied to clipboard
[R-243] docs: Add documentation on the best approach to define custom metrics.
[X] I checked the documentation and related resources and couldn't find an answer to my question.
I was exploring the possibility to create custom metrics. It seems that it is possible, by subclassing either Metric, MetricWithLLM or MetricWithEmbeddings.
For example I created a dummy example computing the answer length: Note that I don't need this specific metric, I used it to have a simple example.
import typing as t
from datasets import Dataset
from ragas import evaluate
from ragas.metrics.base import Metric, EvaluationMode
from langchain_core.callbacks import Callbacks
from ragas.run_config import RunConfig
class AnswerLength(Metric):
"""Simple example of a custom metric. Returning the answer length."""
name: str = "answer_length"
evaluation_mode: EvaluationMode = EvaluationMode.qa
async def _ascore(
self: t.Self, row: t.Dict, callbacks: Callbacks, is_async: bool
) -> float:
return len(row["answer"])
def init(self, run_config: RunConfig):
"""do nothing"""
answer_length = AnswerLength()
data_samples = {
'question': ['When was the first super bowl?', 'Who won the most super bowls?'],
'answer': ['The first superbowl was held on Jan 15, 1967', 'The most super bowls have been won by The New England Patriots'],
}
dataset = Dataset.from_dict(data_samples)
score = evaluate(dataset, metrics=[answer_length])
score.to_pandas()
My questions are:
- Do you recommend creating custom metrics with ragas? Or is it preferable to exclusively rely on the pre-existing metrics offered by Ragas?
- If yes, is the approach described above correct?
I am not sure if the possibility of creating custom metrics is an intended feature or not. I want to make sure that my custom metric implementations do not break when ragas evolves. I am interested in knowing the vision about supporting and documenting the possibility of having custom metrics in the future.