xiaoyao0115

Results 3 issues of xiaoyao0115

### PR Category CINN ### PR Types Not User Facing ### Description Pcard-67164

contributor

# Description Fused multiple kernels in the out correction computation of attention in CP into a single kernel, reducing total correction runtime and the kernel CPU launch overheads. ## Type...

This PR is the second part of hybrid-cp. The first part is: https://github.com/NVIDIA/Megatron-LM/pull/2054.​ (PR for main branch:https://github.com/NVIDIA/Megatron-LM/pull/2304) Compared to part 1, this PR adds the following:​ - Added support for...

enhancement
module: moe
dev branch