Ryan H. Tran
Ryan H. Tran
This PR provides a draft evaluation integration for the MINT benchmark which tests the agent's ability to solve tasks with multi-turn interactions. This benchmark tests the agent's ability of code...
**What problem or use case are you trying to solve?** The current search skills available to the agent is: ```python - search_dir(search_term, dir_path='./'): # Searches for a term in all...
**Short description of the problem this fixes or functionality that this introduces. This may be used for the CHANGELOG** - This PR implements a simplified multi-agent workflow inspired by the...
### Is there an existing issue for the same bug? - [X] I have checked the troubleshooting document at https://docs.all-hands.dev/modules/usage/troubleshooting - [X] I have checked the existing issues. ### Describe...
[Bug]: (eval) Instance results with llm proxy `OpenAIException` errors got merged into output.jsonl
### Is there an existing issue for the same bug? - [X] I have checked the troubleshooting document at https://docs.all-hands.dev/modules/usage/troubleshooting - [X] I have checked the existing issues. ### Describe...
**End-user friendly description of the problem this fixes or functionality that this introduces** - [ ] Include this change in the Release Notes. If checked, you must provide an **end-user...
**End-user friendly description of the problem this fixes or functionality that this introduces** - [ ] Include this change in the Release Notes. If checked, you must provide an **end-user...
**End-user friendly description of the problem this fixes or functionality that this introduces** - [ ] Include this change in the Release Notes. If checked, you must provide an **end-user...
**End-user friendly description of the problem this fixes or functionality that this introduces** - [ ] Include this change in the Release Notes. If checked, you must provide an **end-user...
**End-user friendly description of the problem this fixes or functionality that this introduces** - [ ] Include this change in the Release Notes. If checked, you must provide an **end-user...