TransformerEngine
TransformerEngine copied to clipboard
Flash attention support softcap.
Description
Flash attention had support softcap in commit 8f873cc6, which is used in gemma2.
Fixes # (issue)
Type of change
- [ ] New feature (non-breaking change which adds functionality)
Changes
add softcap args in Flashattention, and update _flash_attn_max_version to 2.6.1
Checklist:
- [ ] I have read and followed the contributing guidelines
- [ ] The functionality is complete
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
@Lzhang-hub Could we maybe, instead of the warning, check the version of flash attention installed and error out if the version number is too low? Also, please sign your commits (see https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst#sign-your-work for details).
@Lzhang-hub Could we maybe, instead of the warning, check the version of flash attention installed and error out if the version number is too low? Also, please sign your commits (see https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst#sign-your-work for details).
OK