HeKa

Results 17 comments of HeKa

@mnicely Thank you very much for your answer. May I ask how much improvement has been made compared to Dao-AILab flash attention 2 according to your evaluation?

@mnicely I have noticed that speed-up benchmark at cudnn release note recently. Yes, it looks perfect. But is there any more details for QKV shape and something else. A single...

@gautam20197 head (d) = 128 with any batch size or sequence length?

> I think you can check your use case using the PyTorch nightlies. `pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121` > > And running the PyTorch SDPA example https://pytorch.org/tutorials/intermediate/scaled_dot_product_attention_tutorial.html...

Python 3.10 is compatible. You could build TFRA by yourself or just wait some time.

@Mr-Nineteen Could you solve this problem if it is convenient for you?