KUNPENG GUO

Results 3 comments of KUNPENG GUO

Someone has updates on this PR? It would be great if it's merged to the main code base..

``` FlashAttention backward for head dim > 64 requires A100 or H100 GPUs as the implementation needs a large amount of shared memory. ``` This might be related.. got this...