verl
verl copied to clipboard
Make DAPO rollout faster and more efficient (Refactor ShardingManager)
Thank you for sharing the great codebase.
While experimenting with DAPO, I observed that model resharding/offloading occurs multiple times when filter_groups is enabled. This is due to the current ShardingManager context reverting all sharded/offloaded models at __exit__, which becomes inefficient - especially with large models, e.g., 70B - when the context is re-entered multiple times without model updates.
To address this, I refactored the lifecycle of ShardingManager into three separate functions: enter, rollout, and exit. This allows the model to be sharded/offloaded and later reverted only once, rather than at every rollout step. __enter__ and __exit__ operates as same as before.
To minimize interface changes, I kept some dummy arguments (e.g., dummy input to setup_generate_sequences_efficient and teardown_generate_sequences_efficient), which can be revisited later. Feedback on this approach or implementation details is highly appreciated.
Note: This PR is not yet ready to be merged. Pending tasks include:
- Add support for Megatron (currently FSDP-only)
- Test & benchmark performance improvements
- Polish the implementation for readability and maintainability
Please let me know if I overlooked anything. Thanks in advance!
Verified the execution in my environment and observed approximately 20% speedup in rollout generation using the Qwen32B model (350s → 300s).
Verified the execution in my environment and observed approximately 20% speedup in rollout generation using the Qwen32B model (350s → 300s).
excellent!but i can not understand the second step rollout generater_seq without enter func is ok?it seems lack some model weights sync?can you explan this for me ? i am a Vegetable Chicken,Looking forward to your answer
Verified the execution in my environment and observed approximately 20% speedup in rollout generation using the Qwen32B model (350s → 300s).
excellent!but i can not understand the second step rollout generater_seq without enter func is ok?it seems lack some model weights sync?can you explan this for me ? i am a Vegetable Chicken,Looking forward to your answer
ok,i know the reason now,haha,it works on num_gen_batches scenarios