ParlAI
ParlAI copied to clipboard
Add FlashAttention Kernel in Triton
Tldr: Add implementations of FlashAttention using OpenAI's triton language.
Background:
- FlashAttention: an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes, 15% end-to-end speedup on BERT-large compared to the MLPerf 1.1, 3× speedup on GPT-2, and 2.4× speedup on long-range arena.
- Triton: Python-like programming language to write highly efficient GPU code.
Major Changes:
- add FlashAttention forward pass to parlai
- replace encoder self-attention with FlashAttention
- add unit test for self-attention functionality to ensure correctness within acceptable epsilon (0.01)
Evaluations:
- unit tests
N_CTX | Triton | ParlAI | |
---|---|---|---|
0 | 512.0 | 0.333863 | 0.696174 |
1 | 1024.0 | 1.005227 | 2.410752 |
2 | 2048.0 | 3.513344 | 9.326592 |
3 | 4096.0 | 13.148160 | 37.028866 |
- convai2 results
- runtime
- result qualitative
- result quantitative
Testing Steps:
wow this heroic change set
is this ready for review or still a draft?
@klshuster I don't think the code will ever be mergeble given Triton's experimental nature. I talked with some people worked on Flash attention, it seems Triton's implementation only work with specific head dim. I asked @pearlli98 to open a PR in order to save her WIP, and we could review the code if we want to.
perhaps we can merge some of the results to an internal project directory?
perhaps we can merge some of the results to an internal project directory?
@klshuster I can push the results to parlai-internal. The problem is that we don't have the transformers modules files, which i made changes to, on the internal repo. Do you want me to separate the changes in two different places (code files here and results on internal) ?
yes that would be great
yes that would be great
@klshuster I have removed the results folder and move it to ParlAI-internal under this PR.
Hi @pearlli98!
Thank you for your pull request.
We require contributors to sign our Contributor License Agreement, and yours needs attention.
You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.
Process
In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.
Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed
. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.
If you have received this in error or have any questions, please contact us at [email protected]. Thanks!