Gym
Gym copied to clipboard
Generic Aviary integration
This PR enables running Gym on Aviary environments. The two main concepts:
AviaryResourcesServer: maps to an AviaryTaskDataset: spawns and manages multiple environments- Unlike other
ResourcesServers, it doesn't take arbitrary task specs, but an integer index into theTaskDataset. Otherwise we'd have data defined in two places - Instead of tool-specific endpoints, we have one
/stependpoint. This is because:- Aviary environments define their transition function in
step(). Simply calling the bare tools can have undefined behavior (e.g. state isn't updated properly) - Aviary tools are not guaranteed to be available until
reset()is called.
- Aviary environments define their transition function in
- A
/closeendpoint is added to tear down resources
- Unlike other
AviaryAgent: analogous toSimpleAgent, but:- Request is an integer index (which is forwarded to
AviaryResourcesServer). In general, we expectenv.reset()to provide the first messages, not the calling code - All tool calls are sent to
/step - We rely on the environment to tell us when we're done
- Request is an integer index (which is forwarded to
Two concrete Aviary datasets/environments are integrated: GSM8k with a calculator environment and BixBench with a notebook environment. Adding new ones is pretty lightweight (most of the code in notebook_app.py is from defining a BixBench-compatible environment, not the integration).