gpt-neox
gpt-neox copied to clipboard
block-sparse flash attention support
I saw flash attention was recently merged.
This approximate attention would be cool to have as well for training very large sequence lengths. https://github.com/HazyResearch/flash-attention/blob/main/flash_attn/flash_blocksparse_attention.py
Hello, I am new to the EleutherAI team and i think this would be a good issue to try to solve. May i be assigned this task please.
@natek-1 Welcome! Thank you for your contribution.
Hey @natek-1 , do you have any updates on this? It's totally alright if you haven't gotten a chance to look at it. Would it be alright if we assigned it to someone else?