Mingyang Song

Results 1 comments of Mingyang Song

Hi, thanks for your interest! We don’t use DeepSpeed features during the RL procedure, as they may conflict with VLLM-based rollouts. However, we provide a Zero-3 config for TF-EVAL inference...