verl icon indicating copy to clipboard operation
verl copied to clipboard

Support flowgrpo and mixgrpo

Open Kirrito-k423 opened this issue 3 weeks ago • 0 comments

Feature request

support flowgrpo and mixgrpo

Motivation

An increasing number of papers on multimodal generative models are exploring Reinforcement Learning (RL) instead of Direct Preference Optimization (DPO).

Your contribution

TODO

Kirrito-k423 avatar Dec 01 '25 03:12 Kirrito-k423