Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard
Add FlashAttention
This PR aims to add an option to use FlashAttention. Inspired by https://github.com/NVIDIA/Megatron-LM/pull/267
cc @thomasw21