FasterTransformer
FasterTransformer copied to clipboard
Incomplete explanation
Branch/Tag/Commit
v5.3_tag
Docker Image Version
22.08
GPU name
RTX 3070
CUDA Driver
470.129.06
Reproduced Steps
https://github.com/NVIDIA/FasterTransformer/blob/release/v5.3_tag/src/fastertransformer/layers/attention_layers/FusedAttentionLayer.h#L28
// This class is only used when we satisfy the following conditions:
// 1. FP16
// 2. Temporally add seqlen <= 512 limitation because the
template<typename T>
Incomplete explanation