dspy icon indicating copy to clipboard operation
dspy copied to clipboard

Evaluate should take multiple metrics

Open thomasahle opened this issue 5 months ago • 3 comments

Right now Evaluate(...) only take one metric, but often we have multiple different scores we want to test at the same time. Like "accuracy" and "gold_passages_retrieved" and "q/s" etc.

While it's not obvious how to support multiple metrics for compilation, it should be easier to do for evaluation.

thomasahle avatar Feb 05 '24 20:02 thomasahle

@thomasahle You are right, you have made 2 great points about Evaluate.

First, we need the key parallelism logic to be factored out so people can do parallel steps. (btw this will be not too hard to make work inside modules, I know the parts that need care, it's basically dspy.settings, especially dspy.settings.trace at bootstrap time)

Second, we need to support multi-metric evaluate, which is a smaller change.

Can I help you do a PR? :sweat_smile:

okhat avatar Feb 06 '24 15:02 okhat

I'm happy to send some PRs. Right now I'm just a bull in a china shop hitting random obstacles, and creating issue reports to keep track of them. I don't think this one is super important to fix right now, but I just wanted to register it. If I'm creating too much spam on the issue tracker, I'm also happy to just keep a personal list of things to look into down the line :-)

If you add tags to the github issue tracker, I can mark it as "nice to have" or "not important"

thomasahle avatar Feb 06 '24 19:02 thomasahle

hello there, I am also looking for something similar to this.. any recent updates?

bhuvana-ak avatar Apr 12 '24 02:04 bhuvana-ak