Yoon Noh Lee issues

Results 4 issues of


                                            Yoon Noh Lee

question about flased multi head attention in trtllm-build

when using trtllm-build, we can use fused attention in LLMs. I wonder TensorRT-LLM down fmha data precision automatically. I want to use TensorRT-LLM without data type downprecision for accuary.

[QST] Where is FlashAttention-2 CUTLASS kernel

Hello, I'am study fused_multi_head_attention example in CUTLASS. In CUTLASS 3.5.1 README.md, it said flash attention 2 kernel is in CUTLASS. But in fused_multi_head attention, it is based on Meta/xFormer. I...

question

? - Needs Triage

CUTLASS Fused multi head attention

# ❓ Questions and Help Hello, I am watching fused multi-head attention in 3rdparty/cutlass. In cutlass/examples, fused multi head attention is upstream to xformers. And CUTLASS said fused multi head...

[BUG] Accuracy Error in CUTLASS GEMM operations.

**Describe the bug** The result of accuracy of GEMM operation in CUTLASS (TensorOp, Simt) does not fully match accuracy of cuBLAS GEMM result. **Steps/Code to reproduce bug** ``` using GEMM...

bug