promptfoo icon indicating copy to clipboard operation
promptfoo copied to clipboard

Add more deterministic/math-based assertions

Open sinedied opened this issue 1 year ago • 4 comments

Is your feature request related to a problem? Please describe. Some of the common math-based evaluation metrics for NLP/LLM includes ROUGE (already supported), BLEU, METEOR, GLEU and some others.

See https://github.com/Aldenhovel/bleu-rouge-meteor-cider-spice-eval4imagecaption and https://huggingface.co/spaces/evaluate-metric/google_bleu for details and examples.

Describe the solution you'd like I'd like these common evaluation metrics to be available as assertions in promptfoo.

Describe alternatives you've considered Use a custom assertion to implement them. I believe it would be beneficial to all promptfoo users to have such assertions built-in.

sinedied avatar Sep 23 '24 06:09 sinedied

@mldangelo Hi, I was looking at tackling this issue. Here's how I'm planning to go about this:

  1. define meteor, gleu in the enum here: https://github.com/promptfoo/promptfoo/blob/main/src/types/index.ts#L387
  2. register a corresponding handler here: https://github.com/promptfoo/promptfoo/blob/main/src/assertions/index.ts#L231
  3. define the handlers in their individual files similar to rouge.ts

does this sound good? Let me know if you see any concerns

adityabharadwaj198 avatar Apr 07 '25 21:04 adityabharadwaj198

Awesome @adityabharadwaj198!

I just opened a PR with guidance on adding a new assertion: https://github.com/promptfoo/promptfoo/pull/3610 Please feel free to leave comments on the PR if you think parts of it can be improved.

You can reference https://github.com/promptfoo/promptfoo/pull/3605, https://github.com/promptfoo/promptfoo/pull/2469, and https://github.com/promptfoo/promptfoo/pull/2081 as recent assertion PRs.

Good luck! And send me an email when you're done to michael @ promptfoo.dev and I'll send you some swag.

mldangelo avatar Apr 07 '25 23:04 mldangelo

thanks @mldangelo !

adityabharadwaj198 avatar Apr 08 '25 02:04 adityabharadwaj198

@mldangelo I opened a PR for adding meteor score: https://github.com/promptfoo/promptfoo/pull/3776. Would love to hear your thoughts on it!

adityabharadwaj198 avatar Apr 23 '25 14:04 adityabharadwaj198