verl
verl copied to clipboard
[WIP] PRIME algorithm
Refactor and merge PRIME algorithm into verl/main https://github.com/PRIME-RL/PRIME
@hiyouga have you seen the hf timeout issue before, in the geo3k test?
Hi! I have a question about balance_batch. Why is it being made an optional feature in this PR? Are there any cases where balance_batch could have a negative impact? I find this a bit confusing and am unsure whether to enable it.