evidently
evidently copied to clipboard
Add a new `ROUGE` metric to Evidently
About Hacktoberfest contributions: https://github.com/evidentlyai/evidently/wiki/Hacktoberfest-2024
Description
The ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric evaluates the quality of a generated text by comparing it to a reference text (typically a summary). It measures how much of the reference text is covered by the generated summary through n-gram overlap. Several common ROUGE variants exist:
- ROUGE-1: Measures unigram (word-level) overlap.
- ROUGE-2: Measures bigram (two-word sequence) overlap.
- ROUGE-N: Measures n-gram overlap between the candidate and reference text.
We can implement a ROUGE metric that takes the parameter n
and computes both the descriptor values (overlap) for each row and a summary ROUGE metric for the dataset.
Note that this implementation would require creating a new Metric (instead of defaulting to ColumnSummaryMetric
to aggregate descriptors values) to compute and visualize the summary ROUGE score. You can check other dataset-level metrics (e.g., from classification or ranking) for inspiration.