Gym icon indicating copy to clipboard operation
Gym copied to clipboard

Docs + Environment pattern: RLHF

Open bxyu-nvidia opened this issue 1 month ago • 0 comments

Use cases, pain points, and background

Description:

Design: We probably need to make some generic reward model client that can be shared infra for all RLHF environments.

Out of scope:

Acceptance Criteria:

  • [ ] Gym spins up a reward model locally like in the local vLLM model flow
  • [ ] Replicate the current Nemotron RLHF process

bxyu-nvidia avatar Nov 19 '25 02:11 bxyu-nvidia