Thibault LSDC
Thibault LSDC
Comes in combination with this bgym PR: https://github.com/ServiceNow/BrowserGym/pull/340 ## Description by Korbit AI ### What change is being made? Introduce a new `ToolUseAgent` and supporting benchmark data, and replace existing...
## Description by Korbit AI ### What change is being made? Add a new debug agent in `debug_agent.py` for manual testing of actions within the browser gym environment. ### Why...
## Description by Korbit AI ### What change is being made? Add a pytest skip marker to the `test_launch_parallel_study` test case, with a reason indicating the use case is not...
https://github.com/ServiceNow/AgentLab/blob/a228d4105047bb27fcc24e61d626e476a586572f/tests/llm/test_tracking.py#L45-L52
If a SequentialStudies breaks in the middle of a Study, the ones after cannot be relaunched, as the files were never created in a .prepare().
## Description by Korbit AI ### What change is being made? Add resample benchmark objects `ResampleBenchmark`, `AllTasksBenchmark`, and `HighVarianceBenchmark` to the `custom_benchmark.py` file. ### Why are these changes being made?...
Making updates to fit https://github.com/ServiceNow/BrowserGym/pull/291
## Benchmark tutorial This PR serves as a template for benchmark creation. It involves the following steps: - [ ] Creating a task object to integrate your tasks to BrowserGym...
Moving version fetching to BrowserGym. It feels too hardcoded atm, maybe we could deduce the pkg name automatically ? Maybe some kind of regex on the pkg name