tanlong Du
tanlong Du
When using the verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2 image, an error occurs on the B200 server, but this issue does not exist on the H200 server. Is there currently a Docker image available for...
verl上megatron_workers里面optimizer精度只支持bf16和fp16,之前的版本直接通过硬编码的形式设置成bf16。但是训练中想稳定optimizer的master weight应该是fp32,目前看并不是这样,能解释原因吗? : # TODO: add more optimizer args into config if self._is_actor: optim_config_megatron = init_megatron_optim_config(optim_config, fp16=self.dtype == torch.float16) actor_optimizer = get_megatron_optimizer(model=actor_module, config=optim_config_megatron) actor_optimizer_scheduler = get_megatron_optimizer_param_scheduler( optimizer=actor_optimizer, config=optim_config )
### System Info In the new version of the code, the calculation of reward will be included in the generation of the sequence, which results in the logic of overlong_buffer_cfg...
### System Info bug1:The qwen3-4B model saved in the hf format has only one weight file, missing one. bug2:There was an error when converting the Megatron format weights to HF....