verifiers add GymEnv

Description

Adds GymEnv, an optional Environment subclass that runs classic reset/step simulator loops. This lets you plug Gym-style environments directly into verifiers (including custom simulators defined inside this repo) without first converting them into a static dataset.

GymEnv supports:

Homogeneous mode (env_cls): one env class with optional env_kwargs; dataset rows feed directly into reset(**info).
Heterogeneous mode (env_registry): a registry of env classes; each dataset row specifies info.env_type and optional info.env_kwargs to select/configure the env per rollout.
Custom mode (subclass + _make_env): full control over environment construction.

Additional features:

automatic dataset generation when none is provided (so RL training “just works”),
optional evaluation override via a user-supplied eval_runner.

GymEnv is fully opt-in and does not affect existing environments.

Type of Change

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] Documentation update
[ ] Test improvement

Testing

[x] All existing tests pass when running uv run pytest locally.
[x] New tests have been added to cover the changes

Checklist

[x] My code follows the style guidelines of this project as outlined in AGENTS.md
[ ] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[x] Any dependent changes have been merged and published

Additional Notes

Dataset & trainer integration
- Environment requires at least one of dataset or eval_dataset.
  GymEnv now handles this automatically:
  - If you provide a dataset or eval dataset explicitly, they are used as-is.
  - If you provide no datasets and keep auto_dummy_eval=True:
    - We auto-build a training dataset of length num_train_episodes via _build_auto_dataset(...).
    - In homogeneous mode (env_cls set):
      - dataset = this auto dataset.
      - eval_dataset = a 1-row dummy eval set, preserving “episodes mode” behavior where num_examples maps cleanly to rollouts_per_example.
    - In heterogeneous mode (env_registry set):
      - We auto-generate rows where each info contains a valid env_type (round-robin over registry keys) and default env_kwargs={}.
      - This auto dataset is used for both dataset and eval_dataset, ensuring all rows map cleanly into actual env instances.

TBD

Demonstrate an end-to-end RL training run using RLTrainer + GymEnv (homogeneous and heterogeneous cases) to validate.

Nov 07 '25 20:11 hallerite

All committers have signed the CLA.

Nov 07 '25 20:11 CLAassistant

nice! would you wanna do the PR on top of the trajectories branch? https://github.com/PrimeIntellect-ai/verifiers/pull/549

there's a bunch of new things which should make native gym-style rollouts much easier, especially for training.

also would be ideal if we still had a notion of a dataset, as this is how trainers typically expect to work with verifiers envs

can be generated at init with a given number of hidden state samples (as in textarena_env)

Nov 09 '25 12:11 willccbb

yes! will do it on top of it. I'll just wait for the other to be finished

Nov 09 '25 14:11 hallerite

verifiers verifiers copied to clipboard

add GymEnv

Description

Type of Change

Testing

Checklist

Additional Notes

TBD

verifiers
verifiers copied to clipboard