TensorRT-LLM
TensorRT-LLM copied to clipboard
[bug] MMHA_USE_FP32_ACUM_FOR_LOGITS and MMHA_USE_FP32_ACCUM_FOR_LOGITS
https://github.com/NVIDIA/TensorRT-LLM/blob/b57221b764bc579cbb2490154916a871f620e2c4/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttentionLaunch.h#L56 and https://github.com/NVIDIA/TensorRT-LLM/blob/b57221b764bc579cbb2490154916a871f620e2c4/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttentionTemplate.h#L1309 must be the same name
thanks. we will fix that soon.