zhangyuqin1998

Results 11 issues of zhangyuqin1998

### Task Description When the data volume in the pipeline is small, the GI operator can only divide a small number of granules, resulting in some workers being idle because...

### Task Description In the volcano model, the pipeline's working mode makes it difficult for us to analyze the actual execution performance of a certain operator. To clarify the data...

### PR types New features ### PR changes Models ### Description 为fleet的context parallel增加ring flash attention的支持

### PR types Bug fixes ### PR changes Models ### Description 修复lmhead没有使用rng_state的问题

### PR Category Others ### PR Types Others ### Description 修复coverage UT挂的问题。pir下静态图使用了append op来把通信算子加入计算图,但目前通信算子不支持pir。所以pir模式下,ut不应测试通信算子的静态图模式。

### PR types Others ### PR changes Others ### Description 添加部分对齐模式的支持

…gs for enable_delay_scale_loss ### PR types Bug fixes ### PR changes Others ### Description **A.修复动态图自动并行下,split_batches_for_accumulation与动手无法对齐的情况**。如图 ![dae478d91bc7c7a10aa3bcc927793128](https://github.com/user-attachments/assets/693cf4d1-009a-46b2-b1d1-17e5c1305f2e) **B.修复动态图自动并行下,enable_delay_scale_loss逻辑错误的问题**。自动并行默认实现enable_delay_scale_loss,预期行为为: 1. 每个micro batch计算出loss 2. 反向传播 3. 对mini batch内的loss进行求和 4. 对loss进行scale,除以acc数 但当前动态图自动并行的行为为: 1. 每个micro...

### PR types Others ### PR changes Others ### Description acc > 1时,自动并行之前使用numpy.sum对loss求和,动手下使用paddke.add对loss求和,两者算出的结果有微小diff。对齐模式下让自动并行与动手对齐。

### PR Category Distributed Strategy ### PR Types New features ### Description In **large-scale, highly sparse Mixture-of-Experts (MoE) training**, cross-machine communication time accounts for a significant portion of the end-to-end...

### PR types New features ### PR changes Others ### Description Add forward_backward_overlap_scheduler in pipeline_parallel_config This PR is associated with https://github.com/PaddlePaddle/Paddle/pull/72150 ![image](https://github.com/user-attachments/assets/ca1ed1da-9a6e-417b-b8ef-6128d0afcd40)