flash-attention No module named 'flash_attn_2

I'm currently trying to setup flash attn but I seem to receive this error:

Traceback (most recent call last):
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/transformers/utils/import_utils.py", line 1863, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/0l539chjmcq5kdd43j6dgdjky4sjl7hl-python3-3.12.8/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 999, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 50, in <module>
    from .integrations.flash_attention import flash_attention_forward
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/transformers/integrations/flash_attention.py", line 5, in <module>
    from ..modeling_flash_attention_utils import _flash_attention_forward
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/transformers/modeling_flash_attention_utils.py", line 30, in <module>
    from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input  # noqa
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/flash_attn/flash_attn_interface.py", line 15, in <module>
    import flash_attn_2_cuda as flash_attn_gpu
ModuleNotFoundError: No module named 'flash_attn_2_cuda'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/transformers/utils/import_utils.py", line 1863, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/0l539chjmcq5kdd43j6dgdjky4sjl7hl-python3-3.12.8/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 999, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/transformers/integrations/integration_utils.py", line 36, in <module>
    from .. import PreTrainedModel, TFPreTrainedModel
  File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/transformers/utils/import_utils.py", line 1851, in __getattr__
    module = self._get_module(self._class_to_module[name])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/transformers/utils/import_utils.py", line 1865, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
No module named 'flash_attn_2_cuda'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/transformers/utils/import_utils.py", line 1863, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/0l539chjmcq5kdd43j6dgdjky4sjl7hl-python3-3.12.8/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 999, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/transformers/trainer.py", line 42, in <module>
    from .integrations import (
  File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/transformers/utils/import_utils.py", line 1851, in __getattr__
    module = self._get_module(self._class_to_module[name])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/transformers/utils/import_utils.py", line 1865, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
No module named 'flash_attn_2_cuda'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ayes/IdeaProjects/Iona/src/__init__.py", line 10, in <module>
    from model.trainer import create_trainer
  File "/home/ayes/IdeaProjects/Iona/src/model/trainer.py", line 1, in <module>
    from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling
  File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/transformers/utils/import_utils.py", line 1851, in __getattr__
    module = self._get_module(self._class_to_module[name])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ayes/IdeaProjects/Iona/.venv/lib/python3.12/site-packages/transformers/utils/import_utils.py", line 1865, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
Failed to import transformers.integrations.integration_utils because of the following error (look up to see its traceback):
Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
No module named 'flash_attn_2_cuda'

I've tried many different versions and saw that I am not getting this error when using pytorch 2.2 cpu version. Which isn't ideal and takes way to long. Since I can't find a way to install 2.2 with rocm I kind just have to go with 2.6.0 which, again, results in this error.

I'm pretty new to python and ai in general and am further more limited in my possiblities by using nixos. What steps should I take to resolve this error?

Feb 18 '25 21:02 TheAyes

i met the same problem.At first i thought it was because i hadn't run setup.py but had directly downloaded the github file.Although i ran it for a long time,i still couldn't find it.

Mar 02 '25 03:03 VirgoAsumita

Change to import flash_attn_3_cuda as flash_attn_gpu

Mar 18 '25 12:03 lumosity4tpj

Faced the same issue. Looked at /flash_attn/flash_attn_interface.py:

# We need to import the CUDA kernels after importing torch
USE_TRITON_ROCM = os.getenv("FLASH_ATTENTION_TRITON_AMD_ENABLE", "FALSE") == "TRUE"
if USE_TRITON_ROCM:
    from .flash_attn_triton_amd import interface_fa as flash_attn_gpu
else:
    import flash_attn_2_cuda as flash_attn_gpu# We need to import the CUDA kernels after importing torch
USE_TRITON_ROCM = os.getenv("FLASH_ATTENTION_TRITON_AMD_ENABLE", "FALSE") == "TRUE"
if USE_TRITON_ROCM:
    from .flash_attn_triton_amd import interface_fa as flash_attn_gpu
else:
    import flash_attn_2_cuda as flash_attn_gpu

Just add FLASH_ATTENTION_TRITON_AMD_ENABLE=TRUE env variable before the command. It will go AMD route

Aug 04 '25 15:08 samanamp

flash-attention flash-attention copied to clipboard

No module named 'flash_attn_2_cuda'

flash-attention
flash-attention copied to clipboard