FasterTransformer
FasterTransformer copied to clipboard
src/fastertransformer/kernels/decoder_masked_multihead_attention /decoder_masked_multihead_attention_template.hpp:36 open this macro definition, it'll find a build error
Branch/Tag/Commit
main
Docker Image Version
nvcr.io/nvidia/pytorch:22.08-py3
GPU name
A10
CUDA Driver
515.65.01
Reproduced Steps
https://github.com/NVIDIA/FasterTransformer/blob/f0b5b8631806aedfbe0d844eb9a32202002dd463/src/fastertransformer/kernels/decoder_masked_multihead_attention/decoder_masked_multihead_attention_template.hpp#L38
open the macro "MMHA_USE_FP32_ACUM_FOR_LOGITS", it'll find compile errors.
how to open the macro? what else need to do more?
### Tasks