[Agent] Implement Critic Model
What problem or use case are you trying to solve? This idea of using choosing the best of multiple solutions has been tried by other SWE-bench submissions, but these strategies were generally based on prompting an existing model like Claude. Rather than using this prompt-based reranking strategy, we trained a dedicated critic model, which we found provided more effective results.
The goal of this issue is to implement a critic model in OpenHands.
Additional context Read more about the OpenHands Critic model here: https://www.all-hands.dev/blog/sota-on-swe-bench-verified-with-inference-time-scaling-and-critic-model
If you find this feature request or enhancement useful, make sure to add a 👍 to the issue
@xingyaoww One scenario that has come up recently is OpenHands generating more code than it needs to for a particular task. Is this the type of thing a Critic Model could help with?
☝️ i've been thinking a lot about it lately, there should be two types of critic (but ideally with the same interface):
- focus on trajectory process: if the agent is solving the problem with the correct process (e.g., use the right tool, do the right thing)
- outcome-based: simply look at the output patch and judge based on that
I'm considering building a more "unified" version of the critic that can potentially do both.
We will likely implement this inside agent-sdk, maybe we can move this issue there?
@jpelletier1 is this agent-sdk now?
@xingyaoww I'm assuming this ticket stays within the OpenHands/OpenHands project right?
@jpelletier1 yes! we will experiment with it in SDK and eventually integrate it here