maxtext icon indicating copy to clipboard operation
maxtext copied to clipboard

added support for cudnn flash attention

Open kocchop opened this issue 1 year ago • 2 comments

  • Implemented cudnn flash attention with Transformer Engine
  • Currently it supports head_dim till 128 and does not support GQA yet. It's an unstable API and would soon change it to a stable one with more feature support
  • Added a flag --multiprocess_gpu to support multiprocess SLURM tasks

kocchop avatar Feb 10 '24 01:02 kocchop

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-cla[bot] avatar Feb 10 '24 01:02 google-cla[bot]

Hi @rwitten I have made the changes. Please let me know how it looks. Also I have been added to the contributor's list. My current commit email is verified but my inital commit email is not. Is there any way to remove it? Just wondering what could be the easiest way to fix this. Thanks!

kocchop avatar Feb 17 '24 03:02 kocchop