TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

potential bug in mixed gemm kernel and scale iterator

Open ginowu opened this issue 8 months ago • 1 comments

In below codes, when scale row number, divide by 64: https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/gemm/kernel/fpA_intB_gemm.h#L415

then when calculating tb row offset, mutiply by 64: https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/transform/threadblock/fine_grained_scale_zero_iterator.h#L161

these two 64 consts are not needed, could you explain the reason behind this? and when group size is 32 in the future someday, divide by zero will happen

ginowu avatar Jul 05 '24 09:07 ginowu