Yoon Noh Lee

Results 4 issues of Yoon Noh Lee

when using trtllm-build, we can use fused attention in LLMs. I wonder TensorRT-LLM down fmha data precision automatically. I want to use TensorRT-LLM without data type downprecision for accuary.

Hello, I'am study fused_multi_head_attention example in CUTLASS. In CUTLASS 3.5.1 README.md, it said flash attention 2 kernel is in CUTLASS. But in fused_multi_head attention, it is based on Meta/xFormer. I...

question
? - Needs Triage

# ❓ Questions and Help Hello, I am watching fused multi-head attention in 3rdparty/cutlass. In cutlass/examples, fused multi head attention is upstream to xformers. And CUTLASS said fused multi head...

**Describe the bug** The result of accuracy of GEMM operation in CUTLASS (TensorOp, Simt) does not fully match accuracy of cuBLAS GEMM result. **Steps/Code to reproduce bug** ``` using GEMM...

bug