verl icon indicating copy to clipboard operation
verl copied to clipboard

[Feature Request]: Support Parallel Reward Calculation for Time-consuming Methods

Open AIBionics opened this issue 1 year ago • 0 comments

Hello,

I would like to propose a feature enhancement aimed at improving the efficiency of reward calculation, particularly for time-consuming methods. Currently, the system waits for all rollouts to complete before initiating the reward calculations. This proposal requests the implementation of parallel processing capabilities so that reward calculations can begin immediately after each rollout completes.

This change would be especially beneficial for scenarios involving remote RM and other computationally intensive reward function computations. If it's not feasible to implement fully independent reward calculations per rollout in the short term, supporting independence across small batches would also be a valuable intermediate step.

Thank you for considering this feature request. I look forward to your thoughts and any potential updates on this front.

AIBionics avatar Feb 27 '25 11:02 AIBionics