Vedant Roy comments

Results 96 comments of


                                            Vedant Roy

[Question] How to match Flash Attention 2 performance?

CuDNN version: `9.1.0`. nvidia-smi: ``` +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | |...

[Question] How to match Flash Attention 2 performance?

[Archive.zip](https://github.com/user-attachments/files/16607253/Archive.zip) I've attached both the benchmarking and CuDNN wrapper code to this post. I suspect the benchmarking code is off, so I'll switch to something simpler (like the Pytorch profiler),...

Vedant Roy

[Question] How to match Flash Attention 2 performance?

[Question] How to match Flash Attention 2 performance?

[Question] How to force certain computations to occur in float16?

[Question] How to force certain computations to occur in float16?

[Question] How to force certain computations to occur in float16?

[Question] How to force certain computations to occur in float16?