tanlong Du issues

Results 4 issues of


                                            tanlong Du

B200 docker image support

When using the verlai/verl:app-verl0.5-transformers4.55.4-vllm0.10.0-mcore0.13.0-te2.2 image, an error occurs on the B200 server, but this issue does not exist on the H200 server. Is there currently a Docker image available for...

megatron optimizer precision setting

verl上megatron_workers里面optimizer精度只支持bf16和fp16，之前的版本直接通过硬编码的形式设置成bf16。但是训练中想稳定optimizer的master weight应该是fp32，目前看并不是这样，能解释原因吗？： # TODO: add more optimizer args into config if self._is_actor: optim_config_megatron = init_megatron_optim_config(optim_config, fp16=self.dtype == torch.float16) actor_optimizer = get_megatron_optimizer(model=actor_module, config=optim_config_megatron) actor_optimizer_scheduler = get_megatron_optimizer_param_scheduler( optimizer=actor_optimizer, config=optim_config )

dapo reward_manager

### System Info In the new version of the code, the calculation of reward will be included in the generation of the sequence, which results in the logic of overlong_buffer_cfg...

bug

megatron save checkpoint bug

### System Info bug1:The qwen3-4B model saved in the hf format has only one weight file, missing one. bug2:There was an error when converting the Megatron format weights to HF....

bug