HeKa
HeKa
@mnicely Thank you very much for your answer. May I ask how much improvement has been made compared to Dao-AILab flash attention 2 according to your evaluation?
@mnicely I have noticed that speed-up benchmark at cudnn release note recently. Yes, it looks perfect. But is there any more details for QKV shape and something else. A single...
@gautam20197 head (d) = 128 with any batch size or sequence length?
> I think you can check your use case using the PyTorch nightlies. `pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121` > > And running the PyTorch SDPA example https://pytorch.org/tutorials/intermediate/scaled_dot_product_attention_tutorial.html...
Python 3.10 is compatible. You could build TFRA by yourself or just wait some time.
Has the bug been fixed?
@Mr-Nineteen Could you solve this problem if it is convenient for you?