evidently icon indicating copy to clipboard operation
evidently copied to clipboard

Add a new `WordMatch` descriptor to Evidently

Open elenasamuylova opened this issue 5 months ago • 1 comments

Add a New WordMatch Descriptor to Evidently

About Hacktoberfest contributions: https://github.com/evidentlyai/evidently/wiki/Hacktoberfest-2024

Description:

Evidently already has an IncludesWords() descriptor that checks if the text contains any (by default) or all specified words, returning a True/False result for each row. However, this descriptor uses a single shared list of words for all rows.

In some cases, such as when evaluating responses against specific ground truth answers, you may need a different list of words for each row. For example, you might want to check if generated responses contain the expected keywords for each row:

Example:

Question Generated Response Expected Words
"Name a primary color." "Red is a primary color." ["blue", "red", "yellow"]

What to Implement:

The new WordMatch() descriptor should:

  1. Accept a with_column parameter: This column contains a list of words specific to each row.
  2. Accept a lemmatize parameter. Default True, to consider inflected and variant words. (Same as IncludesWords() descriptor).
  3. Allow configuration for any or all words present. (Same as IncludesWords() descriptor):
  4. Return True/False for each row` if the specified condition (any or all) is met or not.

References:

  • Check the IncludesWords descriptor for vocabulary word check implementation.
  • For a two-column descriptor implementation, check the SemanticSimilarity descriptor and the CustomPairColumnEval template.

elenasamuylova avatar Sep 23 '24 18:09 elenasamuylova