xiaoyao0115
xiaoyao0115
### PR Category CINN ### PR Types Not User Facing ### Description Pcard-67164
# Description Fused multiple kernels in the out correction computation of attention in CP into a single kernel, reducing total correction runtime and the kernel CPU launch overheads. ## Type...
This PR is the second part of hybrid-cp. The first part is: https://github.com/NVIDIA/Megatron-LM/pull/2054. (PR for main branch:https://github.com/NVIDIA/Megatron-LM/pull/2304) Compared to part 1, this PR adds the following: - Added support for...