Dev branch for the ToolUseAgent
Comes in combination with this bgym PR: https://github.com/ServiceNow/BrowserGym/pull/340
Description by Korbit AI
What change is being made?
Introduce a new ToolUseAgent and supporting benchmark data, and replace existing usage of bgym.Benchmark and bgym.HighLevelActionSetArgs with the newly defined Benchmark and HighLevelActionSetArgs from agentlab.experiments.benchmark.
Why are these changes being made?
These changes are being introduced to expand the functionality of the agent system by adding a ToolUseAgent which leverages tool descriptions to perform actions, while also supporting more refined benchmarking capabilities through the new Benchmark and HighLevelActionSetArgs classes which allow for more consistent and modular benchmarking configurations. This improves scalability and ease of future adaptations and improvements in the agent's capabilities and testing environments.
Is this description stale? Ask me to generate a new description by commenting
/korbit-generate-pr-description
Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment /korbit-review.
Your admin can change your review schedule in the Korbit Console
This PR is stale and has been merged earlier.