Flash attention support softcap.

Open Lzhang-hub opened this issue 1 year ago • 2 comments

Description

Flash attention had support softcap in commit 8f873cc6, which is used in gemma2.

Fixes # (issue)

Type of change

[ ] New feature (non-breaking change which adds functionality)

Changes

add softcap args in Flashattention, and update _flash_attn_max_version to 2.6.1

Checklist:

[ ] I have read and followed the contributing guidelines
[ ] The functionality is complete
[ ] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[ ] My changes generate no new warnings
[ ] I have added tests that prove my fix is effective or that my feature works

Jul 14 '24 02:07 Lzhang-hub

@Lzhang-hub Could we maybe, instead of the warning, check the version of flash attention installed and error out if the version number is too low? Also, please sign your commits (see https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst#sign-your-work for details).

Jul 16 '24 18:07 ptrendx

@Lzhang-hub Could we maybe, instead of the warning, check the version of flash attention installed and error out if the version number is too low? Also, please sign your commits (see https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst#sign-your-work for details).

Jul 17 '24 07:07 Lzhang-hub

TransformerEngine TransformerEngine copied to clipboard

Flash attention support softcap.

Description

Type of change

Changes

Checklist:

TransformerEngine
TransformerEngine copied to clipboard