Yoon Noh Lee comments

Results 4 comments of


                                            Yoon Noh Lee

CUTLASS Fused multi head attention

thank you for relpy. In CUTLASS [examples](https://github.com/NVIDIA/cutlass/blob/v3.5.1/examples/41_fused_multi_head_attention/fused_multihead_attention_fixed_seqlen.cu), is said it's code is upstream to xformers. > Acknowledgement: Fixed-sequence-length FMHA code was upstreamed by Meta xFormers (https://github.com/facebookresearch/xformers). therefore I think xformers...

[BUG] Accuracy Error in CUTLASS GEMM operations.

Thank you for reply. I test GEMM dimension (8192, 8192, 28672). When I test integers within an expected represntable range, accuracy of CUTLASS, and cuBLAS GEMM is matched. Also when...

[QST] Where is FlashAttention-2 CUTLASS kernel

Thank you for reply. flash attention concentrate at A100, and H100 kernels. I'm curious flash attention kernel is efficient with Jetson AGX or RTX series. If it is not efficient,...

[QST] Where is FlashAttention-2 CUTLASS kernel

Thank you for reply. I will do it in Sm8x series GPUs.