boardman

Results 1 comments of boardman

Thank you for the reply. In the absence of role-level rewards and with a static workflow, can we regard the setup as equivalent to GRPO optimized with a trajectory-level reward?