Tri Dao

Results 280 comments of Tri Dao

Yeah idk, it works fine with the [Pytorch](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) container from Nvidia (e.g. 23.05 has Pytorch 2.0) by `pip install triton==2.0.0.dev20221202`.

Idk what's wrong, I use the pytorch docker container and haven't seen this error.

Attention mask isn't supported (either in v1 or v2). I might implement it at some point but there are other priorities now.

> Curious what this would take or if it is still out of scope for the flash attention library? Not out of scope, it's just someone needs to go implement...

> I was wondering if there has been any updates on this? AlphaFold3 uses a lot of attention pair biasing and it would be tremendously useful to computational biology if...

As you can see in the code, key_padding_mask just removes elements from keys and values before passing to the flash attention kernel. There's no attention mask passed to the kernel.

Thanks so much for the great work, and congrats on the speedup on Uni-Fold! I'll have more time this weekend to review carefully.

@guolinke @robotcator Do we need both mask & bias, or would a single bias suffice? I think that could simplify the code & reduce compilation time. > Attention Mask >...

> the flatten-non-padding input is not trivial in alphafold2. I see, thanks for explaining, this is very helpful. How about we pass in a tensor (type int) with the sequence...

Another way to phrase this question: is the mask for each sequence always of the form [0, 0, ..., 0, -inf, -inf ...]? Or could they have the form [0,...