Ryan H. Tran
Ryan H. Tran
Ran an eval on the 30 instances above locally, the result looks reasonable (baseline got 13/30). CC @xingyaoww
No, the ordering fix doesn't go into this release. This only contains your fix
@xingyaoww Running eval after adding the sorting fix and this pending PR: https://github.com/All-Hands-AI/openhands-aci/pull/51, now we get 12/30 compared to 13/30:
Yeah, indeed there seems to be a relevant issue there: https://github.com/BerriAI/litellm/issues/8193
> In addition, can't we just refer to the MCP in the microagent content? How do you envision the frontmater processing for this particular field? I think `mcp_location` is the...
Unfortunately from the trajectory in the jsonl file there're no traceback. There's only one last entry from the `history` field beside the `error` field above. I can try capturing the...
Thanks for the fix! Btw can you explain why retrying the whole eval is better? Not sure about the architectural side, but imo it may be not necessary to run...
Yeah, from my side I can see the retries happen after your fix. Recently with the new LLM proxy I don't even receive 502 errors anymore. Maybe this PR can...
Yep sounds good! Thanks for the idea, I'll have a closer look.
Hmm... Now the regex parsing causes the test with ipython code containing multiple `file_editor` calls fail: https://github.com/All-Hands-AI/OpenHands/blob/9908e1b28525fe96394446be95fcb00785d0ca0c/tests/runtime/test_ipython.py#L278-L290