TensorRT-LLM [bug] MMHA_USE_FP32_ACUM_FOR_LOGITS and MMHA_USE_FP32_ACCUM_FOR

[bug] MMHA_USE_FP32_ACUM_FOR_LOGITS and MMHA_USE_FP32_ACCUM_FOR_LOGITS

Open akhoroshev opened this issue 1 year ago • 1 comments

https://github.com/NVIDIA/TensorRT-LLM/blob/b57221b764bc579cbb2490154916a871f620e2c4/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttentionLaunch.h#L56 and https://github.com/NVIDIA/TensorRT-LLM/blob/b57221b764bc579cbb2490154916a871f620e2c4/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderMaskedMultiheadAttentionTemplate.h#L1309 must be the same name

Jan 30 '24 16:01 akhoroshev

thanks. we will fix that soon.

Jan 31 '24 06:01 PerkzZheng

TensorRT-LLM TensorRT-LLM copied to clipboard

[bug] MMHA_USE_FP32_ACUM_FOR_LOGITS and MMHA_USE_FP32_ACCUM_FOR_LOGITS

TensorRT-LLM
TensorRT-LLM copied to clipboard