verifiers icon indicating copy to clipboard operation
verifiers copied to clipboard

add GymEnv

Open hallerite opened this issue 1 month ago • 3 comments

Description

Adds GymEnv, an optional Environment subclass that runs classic reset/step simulator loops. This lets you plug Gym-style environments directly into verifiers (including custom simulators defined inside this repo) without first converting them into a static dataset.

GymEnv supports:

  • Homogeneous mode (env_cls): one env class with optional env_kwargs; dataset rows feed directly into reset(**info).
  • Heterogeneous mode (env_registry): a registry of env classes; each dataset row specifies info.env_type and optional info.env_kwargs to select/configure the env per rollout.
  • Custom mode (subclass + _make_env): full control over environment construction.

Additional features:

  • automatic dataset generation when none is provided (so RL training “just works”),
  • optional evaluation override via a user-supplied eval_runner.

GymEnv is fully opt-in and does not affect existing environments.

Type of Change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Documentation update
  • [ ] Test improvement

Testing

  • [x] All existing tests pass when running uv run pytest locally.
  • [x] New tests have been added to cover the changes

Checklist

  • [x] My code follows the style guidelines of this project as outlined in AGENTS.md
  • [ ] I have performed a self-review of my own code
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [x] My changes generate no new warnings
  • [x] Any dependent changes have been merged and published

Additional Notes

  • Dataset & trainer integration
    • Environment requires at least one of dataset or eval_dataset.
      GymEnv now handles this automatically:
      • If you provide a dataset or eval dataset explicitly, they are used as-is.
      • If you provide no datasets and keep auto_dummy_eval=True:
        • We auto-build a training dataset of length num_train_episodes via _build_auto_dataset(...).
        • In homogeneous mode (env_cls set):
          • dataset = this auto dataset.
          • eval_dataset = a 1-row dummy eval set, preserving “episodes mode” behavior where num_examples maps cleanly to rollouts_per_example.
        • In heterogeneous mode (env_registry set):
          • We auto-generate rows where each info contains a valid env_type (round-robin over registry keys) and default env_kwargs={}.
          • This auto dataset is used for both dataset and eval_dataset, ensuring all rows map cleanly into actual env instances.

TBD

  • Demonstrate an end-to-end RL training run using RLTrainer + GymEnv (homogeneous and heterogeneous cases) to validate.

hallerite avatar Nov 07 '25 20:11 hallerite

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Nov 07 '25 20:11 CLAassistant

nice! would you wanna do the PR on top of the trajectories branch? https://github.com/PrimeIntellect-ai/verifiers/pull/549

there's a bunch of new things which should make native gym-style rollouts much easier, especially for training.

also would be ideal if we still had a notion of a dataset, as this is how trainers typically expect to work with verifiers envs

can be generated at init with a given number of hidden state samples (as in textarena_env)

willccbb avatar Nov 09 '25 12:11 willccbb

yes! will do it on top of it. I'll just wait for the other to be finished

hallerite avatar Nov 09 '25 14:11 hallerite