PrometheusComing comments

Results 4 comments of


                                            PrometheusComing

[Bug]FSDP2 failed to load large model state_dict

> Located at: > > [verl/verl/utils/fsdp_utils.py](https://github.com/volcengine/verl/blob/15b1b15f9963e178b68368c9b3996c60637a5156/verl/utils/fsdp_utils.py#L392-L420) > > Lines 392 to 420 in [15b1b15](/volcengine/verl/commit/15b1b15f9963e178b68368c9b3996c60637a5156) > > def fsdp2_load_full_state_dict(model: torch.nn.Module, full_state: dict, device_mesh=None, cpu_offload=None): > """ > Loads the full state...

[fsdp2] fix: oom issue when loading model state dict in fsdp2

big brother,i used your code,but oom at set_model_state_dict.if i change back to fsdp,it running as normal。can you give me some suggestions?thank you very much

Make DAPO rollout faster and more efficient (Refactor ShardingManager)

> Verified the execution in my environment and observed approximately 20% speedup in rollout generation using the Qwen32B model (350s → 300s). excellent！but i can not understand the second step...

Make DAPO rollout faster and more efficient (Refactor ShardingManager)

> > Verified the execution in my environment and observed approximately 20% speedup in rollout generation using the Qwen32B model (350s → 300s). > > excellent！but i can not understand...