boardman
Results
1
comments of
boardman
Thank you for the reply. In the absence of role-level rewards and with a static workflow, can we regard the setup as equivalent to GRPO optimized with a trajectory-level reward?