OC

Results 26 comments of OC

The image is outdated, this is a workaround: pull image volcengine/sandbox-fusion:server-20250609 start a docker instance and replace source code in the image save the docker as image

> > from verl/workers/fsdp_workers.py. > > ``` > > torch_dtype = fsdp_config.get("model_dtype", None) > > if torch_dtype is None: > > torch_dtype = torch.float32 if self._is_actor else torch.bfloat16 > >...

This log can be ignored. It should use fp32 to enable fsdp optimizer using fp32. It is not recommended to use actor_rollout_ref.actor.fsdp_config.model_dtype=bfloat16

I can reproduce this error using latest image: whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.0-te2.3 It also can be reproduced in my host after sglang and megatron dependances were installed. The problem may come with package...

> May I ask if there is a solution to this issue? I still have this problem with the latest version of Verl Does this help? `export VLLM_USE_V1=1 && ray...

yes, you are right. We need a better method to enable async rollout.