Gym icon indicating copy to clipboard operation
Gym copied to clipboard

Generic Aviary integration

Open sidnarayanan opened this issue 3 months ago • 1 comments

This PR enables running Gym on Aviary environments. The two main concepts:

  • AviaryResourcesServer: maps to an Aviary TaskDataset: spawns and manages multiple environments
    • Unlike other ResourcesServers, it doesn't take arbitrary task specs, but an integer index into the TaskDataset. Otherwise we'd have data defined in two places
    • Instead of tool-specific endpoints, we have one /step endpoint. This is because:
      • Aviary environments define their transition function in step(). Simply calling the bare tools can have undefined behavior (e.g. state isn't updated properly)
      • Aviary tools are not guaranteed to be available until reset() is called.
    • A /close endpoint is added to tear down resources
  • AviaryAgent: analogous to SimpleAgent, but:
    • Request is an integer index (which is forwarded to AviaryResourcesServer). In general, we expect env.reset() to provide the first messages, not the calling code
    • All tool calls are sent to /step
    • We rely on the environment to tell us when we're done

Two concrete Aviary datasets/environments are integrated: GSM8k with a calculator environment and BixBench with a notebook environment. Adding new ones is pretty lightweight (most of the code in notebook_app.py is from defining a BixBench-compatible environment, not the integration).

sidnarayanan avatar Sep 17 '25 23:09 sidnarayanan

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Sep 17 '25 23:09 copy-pr-bot[bot]