Incomplete explanation

Open lix19937 opened this issue 2 years ago • 0 comments

Branch/Tag/Commit

v5.3_tag

Docker Image Version

22.08

GPU name

RTX 3070

CUDA Driver

470.129.06

Reproduced Steps

https://github.com/NVIDIA/FasterTransformer/blob/release/v5.3_tag/src/fastertransformer/layers/attention_layers/FusedAttentionLayer.h#L28

// This class is only used when we satisfy the following conditions:
// 1. FP16
// 2. Temporally add seqlen <= 512 limitation because the
template<typename T>

Incomplete explanation

Jul 15 '23 03:07 lix19937