Yoon Noh Lee
Yoon Noh Lee
thank you for relpy. In CUTLASS [examples](https://github.com/NVIDIA/cutlass/blob/v3.5.1/examples/41_fused_multi_head_attention/fused_multihead_attention_fixed_seqlen.cu), is said it's code is upstream to xformers. > Acknowledgement: Fixed-sequence-length FMHA code was upstreamed by Meta xFormers (https://github.com/facebookresearch/xformers). therefore I think xformers...
Thank you for reply. I test GEMM dimension (8192, 8192, 28672). When I test integers within an expected represntable range, accuracy of CUTLASS, and cuBLAS GEMM is matched. Also when...
Thank you for reply. flash attention concentrate at A100, and H100 kernels. I'm curious flash attention kernel is efficient with Jetson AGX or RTX series. If it is not efficient,...
Thank you for reply. I will do it in Sm8x series GPUs.