zhangyuqin1998
zhangyuqin1998
### Task Description When the data volume in the pipeline is small, the GI operator can only divide a small number of granules, resulting in some workers being idle because...
### Task Description In the volcano model, the pipeline's working mode makes it difficult for us to analyze the actual execution performance of a certain operator. To clarify the data...
### PR types New features ### PR changes Models ### Description 为fleet的context parallel增加ring flash attention的支持
### PR types Bug fixes ### PR changes Models ### Description 修复lmhead没有使用rng_state的问题
### PR Category Others ### PR Types Others ### Description 修复coverage UT挂的问题。pir下静态图使用了append op来把通信算子加入计算图,但目前通信算子不支持pir。所以pir模式下,ut不应测试通信算子的静态图模式。
### PR types Others ### PR changes Others ### Description 添加部分对齐模式的支持
…gs for enable_delay_scale_loss ### PR types Bug fixes ### PR changes Others ### Description **A.修复动态图自动并行下,split_batches_for_accumulation与动手无法对齐的情况**。如图  **B.修复动态图自动并行下,enable_delay_scale_loss逻辑错误的问题**。自动并行默认实现enable_delay_scale_loss,预期行为为: 1. 每个micro batch计算出loss 2. 反向传播 3. 对mini batch内的loss进行求和 4. 对loss进行scale,除以acc数 但当前动态图自动并行的行为为: 1. 每个micro...
### PR types Others ### PR changes Others ### Description acc > 1时,自动并行之前使用numpy.sum对loss求和,动手下使用paddke.add对loss求和,两者算出的结果有微小diff。对齐模式下让自动并行与动手对齐。
### PR Category Distributed Strategy ### PR Types New features ### Description In **large-scale, highly sparse Mixture-of-Experts (MoE) training**, cross-machine communication time accounts for a significant portion of the end-to-end...
### PR types New features ### PR changes Others ### Description Add forward_backward_overlap_scheduler in pipeline_parallel_config This PR is associated with https://github.com/PaddlePaddle/Paddle/pull/72150 