[Feature] Population-wise Metrics

Open dovanquyet opened this issue 1 month ago • 1 comments

What feature would you like to see?

Hi DSPy team,

Thanks for your great work!

I am currently optimizing a Predict program to judge a text in Likert-5 scale. I try to design the metric so that it can compute Pearson correlation b/w ground-truth rating and prediction.

Nonetheless, to my best knowledge about DSPy codebase (especially line https://github.com/stanfordnlp/dspy/blob/main/dspy/evaluate/evaluate.py#L172), I think current implementation of Evaluate only supports sample-wise metrics. I hope to learn if there is a way to design a population-wise metric. If the current codebase doesn't support yet, I am willing to contribute.

Best regards!

Would you like to contribute?

[x] Yes, I'd like to help implement this.
[ ] No, I just want to request it.

Additional Context

No response

Nov 19 '25 15:11 dovanquyet

@isaacbmiller Mind taking a look?

Nov 20 '25 06:11 chenmoneygithub