Docs + Environment pattern: RLHF

Open bxyu-nvidia opened this issue 1 month ago • 0 comments

Use cases, pain points, and background

Description:

Design: We probably need to make some generic reward model client that can be shared infra for all RLHF environments.

Out of scope:

Acceptance Criteria:

Nov 19 '25 02:11 bxyu-nvidia