zhangyuqin1998 issues

Results 11 issues of


                                            zhangyuqin1998

To add random exchange in NLJ and SPF when right child is DAS

### Task Description When the data volume in the pipeline is small, the GI operator can only divide a small number of granules, resulting in some workers being idle because...

Allocate Material Above Nodes

### Task Description In the volcano model, the pipeline's working mode makes it difficult for us to analyze the actual execution performance of a certain operator. To clarify the data...

Add RingFlashAttention for context parallel

### PR types New features ### PR changes Models ### Description 为fleet的context parallel增加ring flash attention的支持

Fix rng_state in llm models

### PR types Bug fixes ### PR changes Models ### Description 修复lmhead没有使用rng_state的问题

[Fix PIR Unittest] fix pir ut for comm op

### PR Category Others ### PR Types Others ### Description 修复coverage UT挂的问题。pir下静态图使用了append op来把通信算子加入计算图，但目前通信算子不支持pir。所以pir模式下，ut不应测试通信算子的静态图模式。

[Auto Parallel] Adding align mode support

### PR types Others ### PR changes Others ### Description 添加部分对齐模式的支持

[Auto Parallel] fix bugs for split_batches_for_accumulation && fix bu…

…gs for enable_delay_scale_loss ### PR types Bug fixes ### PR changes Others ### Description **A.修复动态图自动并行下，split_batches_for_accumulation与动手无法对齐的情况**。如图 ![dae478d91bc7c7a10aa3bcc927793128](https://github.com/user-attachments/assets/693cf4d1-009a-46b2-b1d1-17e5c1305f2e) **B.修复动态图自动并行下，enable_delay_scale_loss逻辑错误的问题**。自动并行默认实现enable_delay_scale_loss，预期行为为： 1. 每个micro batch计算出loss 2. 反向传播 3. 对mini batch内的loss进行求和 4. 对loss进行scale，除以acc数但当前动态图自动并行的行为为： 1. 每个micro...

[Auto Parallel] fix loss sum for auto parallel

### PR types Others ### PR changes Others ### Description acc > 1时，自动并行之前使用numpy.sum对loss求和，动手下使用paddke.add对loss求和，两者算出的结果有微小diff。对齐模式下让自动并行与动手对齐。

[WIP][Distributed] FlashEP: A Flexible and General Strategy for Deep Communication-Computation Overlap for Mixture-of-Experts

### PR Category Distributed Strategy ### PR Types New features ### Description In **large-scale, highly sparse Mixture-of-Experts (MoE) training**, cross-machine communication time accounts for a significant portion of the end-to-end...

Add forward_backward_overlap_scheduler in pipeline_parallel_config

### PR types New features ### PR changes Others ### Description Add forward_backward_overlap_scheduler in pipeline_parallel_config This PR is associated with https://github.com/PaddlePaddle/Paddle/pull/72150 ![image](https://github.com/user-attachments/assets/ca1ed1da-9a6e-417b-b8ef-6128d0afcd40)