Add a New `WordNoMatch` Descriptor to Evidently

Open elenasamuylova opened this issue 5 months ago • 1 comments

About Hacktoberfest contributions: https://github.com/evidentlyai/evidently/wiki/Hacktoberfest-2024

Description:

Evidently already has an ExcludesWords() descriptor that checks if the text does not contain any specified words from a shared list.

However, sometimes you might need to check that the text does not contain words specific to each row instead of a shared list.

Question	Response	Forbidden Words
"Can I cancel my subscription at any time?"	"You are allowed to cancel at any time, and we guarantee that you will receive a refund."	["guarantee", "allowed", "refund"]

The new WordNoMatch() descriptor should:

Accept a with_column parameter: This column contains a list of forbidden words specific to each row.
Accept a lemmatize parameter (default True): When True, this will consider inflected or variant forms of words. Works the same as in the ExcludesWords() descriptor.
Return True/False:
- Return True if the words from the list are not present in the row's text.
- Return False if forbidden words are present.

Check the ExcludesWords() descriptor as reference.
For a two-column descriptor implementation, check the SemanticSimilarity descriptor and the CustomPairColumnEval template.

Sep 23 '24 18:09 elenasamuylova