Tri Dao comments

Results 280 comments of


                                            Tri Dao

Flash-attention under Triton 2.0

Yeah idk, it works fine with the [Pytorch](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) container from Nvidia (e.g. 23.05 has Pytorch 2.0) by `pip install triton==2.0.0.dev20221202`.

Flash-attention under Triton 2.0

Idk what's wrong, I use the pytorch docker container and haven't seen this error.

[v2] Attention Masking

Attention mask isn't supported (either in v1 or v2). I might implement it at some point but there are other priorities now.

[v2] Attention Masking

> Curious what this would take or if it is still out of scope for the flash attention library? Not out of scope, it's just someone needs to go implement...

[v2] Attention Masking

> I was wondering if there has been any updates on this? AlphaFold3 uses a lot of attention pair biasing and it would be tremendously useful to computational biology if...

[v2] Attention Masking

As you can see in the code, key_padding_mask just removes elements from keys and values before passing to the flash attention kernel. There's no attention mask passed to the kernel.

support attentions in AlphaFold2

Thanks so much for the great work, and congrats on the speedup on Uni-Fold! I'll have more time this weekend to review carefully.

support attentions in AlphaFold2

@guolinke @robotcator Do we need both mask & bias, or would a single bias suffice? I think that could simplify the code & reduce compilation time. > Attention Mask >...

support attentions in AlphaFold2

> the flatten-non-padding input is not trivial in alphafold2. I see, thanks for explaining, this is very helpful. How about we pass in a tensor (type int) with the sequence...

support attentions in AlphaFold2

Another way to phrase this question: is the mask for each sequence always of the form [0, 0, ..., 0, -inf, -inf ...]? Or could they have the form [0,...