maxtext
maxtext copied to clipboard
added support for cudnn flash attention
- Implemented cudnn flash attention with Transformer Engine
- Currently it supports head_dim till 128 and does not support GQA yet. It's an unstable API and would soon change it to a stable one with more feature support
- Added a flag
--multiprocess_gpu
to support multiprocess SLURM tasks
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
View this failed invocation of the CLA check for more information.
For the most up to date status, view the checks section at the bottom of the pull request.
Hi @rwitten I have made the changes. Please let me know how it looks. Also I have been added to the contributor's list. My current commit email is verified but my inital commit email is not. Is there any way to remove it? Just wondering what could be the easiest way to fix this. Thanks!