TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

fix: AllReduce CUDA Graph Fix + Kernel Clean up

Open yizhang-nv opened this issue 9 months ago • 0 comments

This PR contains following changes:

  1. Remove all allreduce kernels from customAllreduceKernels.cu except pre_post_norm fusion kernel.
  2. Unify the workspace of old and new fusion kernel
  3. Fix a bug that allreduce kernel may produce wrong result with cuda graph enabled.

To merge this mr, we need to wait for the new allreduce kernels in #3064, and then refactor the current call site of allreduce.

yizhang-nv avatar Mar 25 '25 02:03 yizhang-nv