flash-attention ImportError: /lib64/libc.so.6: version `GLIBC

Environment

OS / node image: CentOS 8 / RHEL 8 derivative
glibc version: 2.28 (python -c "import platform, os; print(platform.libc_ver())")
GPU: NVIDIA A100 (CUDA 12.6)
Python: 3.12 (Conda)
PyTorch: 2.7.0.dev+cu126 (nightly)
Flash-Attention wheel: 2.7.4.post1
pip install https://github.com/OpenRLHF/flash-attn-…/flash_attn-2.7.4.post1+pt270cu126…whl

Reproduction

# activate env and run:
python -c "import flash_attn"

Error:

venvs/conda/openrlhf/lib/python3.12/site-packages/torch/_subclasses/functional_tensor.py:276: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
  cpu = _conversion_method_template(device=torch.device("cpu"))
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "venvs/conda/openrlhf/lib/python3.12/site-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "venvs/conda/openrlhf/lib/python3.12/site-packages/flash_attn/flash_attn_interface.py", line 15, in <module>
    import flash_attn_2_cuda as flash_attn_gpu
ImportError: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by venvs/conda/openrlhf/lib/python3.12/site-packages/flash_attn_2_cuda.cpython-312-x86_64-linux-gnu.so)

Flash attention: pip install https://github.com/OpenRLHF/flash-attn-2.7.4.post1-builds/releases/download/v0.1/flash_attn-2.7.4.post1+pt270cu128cxx11abiTRUE-cp312-cp312-linux_x86_64.whl

Pip list:

einops==0.8.1
filelock==3.18.0
flash_attn==2.7.4.post1
fsspec==2025.5.1
Jinja2==3.1.6
MarkupSafe==3.0.2
mpmath==1.3.0
networkx==3.5
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
pip==25.1
setuptools==78.1.1
sympy==1.14.0
torch==2.7.1
triton==3.3.1
typing_extensions==4.14.0
wheel==0.45.1

Jun 11 '25 05:06 ffaisal93

I am having the same error. Built flash attention from source yesterday and that appeared to go well.

Traceback (most recent call last):
  File "/home/ai/apps/flash-attention/tests/test_flash_attn.py", line 7, in <module>
    from flash_attn import (
  File "/home/ai/miniconda3/envs/comfyenv/lib/python3.11/site-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/home/ai/miniconda3/envs/comfyenv/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 15, in <module>
    import flash_attn_2_cuda as flash_attn_gpu
ImportError: /home/ai/miniconda3/envs/comfyenv/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /home/ai/miniconda3/envs/comfyenv/lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so)

Ubuntu: 25.04 Python: 3.11.11 glibc: 2.41 GPU: RTX 5070 ti Driver Version: 570.124.06 CUDA Version: 12.8 Pytorch: 2.7.1+cu126

So I went down the rabbit hole and installed the latest pytorch 2.8.0.dev20250611+cu128 and got a very similar but different error.

Traceback (most recent call last):
  File "/home/ai/apps/flash-attention/tests/test_flash_attn.py", line 7, in <module>
    from flash_attn import (
  File "/home/ai/miniconda3/envs/comfyenv/lib/python3.11/site-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/home/ai/miniconda3/envs/comfyenv/lib/python3.11/site-packages/flash_attn/flash_attn_interface.py", line 15, in <module>
    import flash_attn_2_cuda as flash_attn_gpu
ImportError: /home/ai/miniconda3/envs/comfyenv/bin/../lib/libstdc++.so.6: version `CXXABI_1.3.15' not found (required by /home/ai/miniconda3/envs/comfyenv/lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so)

There is also a really good description of the error and possible solutions on section 3.4 of this page (which I could not get working.) https://gcc.gnu.org/onlinedocs/libstdc++/faq.html#faq.how_to_install

Jun 11 '25 21:06 bradison1

I ended up solving this issue by burning the environment down and rebuilding it. For me, FlashAttention was built successfully in Python 13 and GCC 13/G++ 13 (which I don't believe is the default for Python 13). I still had problems in Python 12 for some reason.

I have several versions of GCC and G++ installed, so this article was instrumental in managing those and allowing me to experiment.

https://linuxconfig.org/how-to-switch-between-multiple-gcc-and-g-compiler-versions-on-ubuntu-20-04-lts-focal-fossa

I don't think is was needed, but I also threw these in for good measure:

export CC=/usr/bin/gcc-13
export CXX=/usr/bin/g++-13

Good luck!

Jun 13 '25 19:06 bradison1

Same issue for v2.8.0.post2 cu12torch2.7_abi=False. GPU is 5090, torch is v2.7.1+cu128. ubuntu 20.04

Jun 15 '25 14:06 zipzou

Same issue for v2.8.0.post2 cu12torch2.5_abi=False. GPU is A100, torch is v2.5.1+cu121. ubuntu 20.04

Jun 16 '25 03:06 YanxingLiu

The issue in v2.8.0.post2 is caused by this commit https://github.com/Dao-AILab/flash-attention/commit/71f7ac258ac193bf2cecd2c82a0d6e22bcba157f

Jun 16 '25 07:06 efsotr

Github runners now longer support ubuntu 20.04 so we had to compile the wheels with ubuntu 22.04. The right thing is to have the github runners compile with manylinux but I haven't figured out how to set that up

Jun 16 '25 16:06 tridao

But built upon GLIBC 2.32 could cause many compatibility problems, and torch need lower versions of glibc.

Jun 17 '25 00:06 zipzou

Right, we'd appreciate help with configuring the github runner to use manylinux https://github.com/Dao-AILab/flash-attention/blob/main/.github/workflows/publish.yml

Jun 17 '25 04:06 tridao

same issue.

Status message: Job entrypoint command failed with exit code 1, last available logs (truncated to 20,000 chars):
    from openrlhf.trainer.ray import (
  File "/tmp/ray/session_2025-06-18_02-19-51_497600_4018193/runtime_resources/working_dir_files/_ray_pkg_d41598a7255f91ed/openrlhf/trainer/__init__.py", line 1, in <module>
    from .dpo_trainer import DPOTrainer
  File "/tmp/ray/session_2025-06-18_02-19-51_497600_4018193/runtime_resources/working_dir_files/_ray_pkg_d41598a7255f91ed/openrlhf/trainer/dpo_trainer.py", line 5, in <module>
    from flash_attn.utils.distributed import all_gather
  File "/home/linyun/anaconda3/envs/r1-searcher/lib/python3.10/site-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/home/linyun/anaconda3/envs/r1-searcher/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 15, in <module>
    import flash_attn_2_cuda as flash_attn_gpu
ImportError: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /home/linyun/anaconda3/envs/r1-searcher/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so)

how to solve that

Jun 18 '25 02:06 dogeeelin

@dogeeelin Currently, the released wheel is built upon ubuntu 22.04, which has higher version of glibc. Before compiling env solved, you can build the wheels from sources or try to use another released wheel. If used torch>=2.7, we have to compile it by ourselves.

Jun 18 '25 02:06 zipzou

Same issue for v2.8.0.post2 cu12torch2.5_abi=False. GPU is A100, torch is v2.5.1+cu121. ubuntu 20.04v2.8.0.post2 cu12torch2.5_abi=False 也存在同样的问题。GPU 是 A100，torch 是 v2.5.1+cu121。Ubuntu 20.04

It worked!https://blog.csdn.net/qq_44817196/article/details/136203069

Jun 19 '25 04:06 ykj467422034

@dogeeelin Currently, the released wheel is built upon ubuntu 22.04, which has higher version of glibc. Before compiling env solved, you can build the wheels from sources or try to use another released wheel. If used torch>=2.7, we have to compile it by ourselves.

Thank you, this worked when i downgrade to flash-attn=2.7.4.post1 complied with torch <= 2.6 and ubuntu 20.04 https://github.com/Dao-AILab/flash-attention/releases/tag/v2.7.4.post1

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126 pip install flash-attn==2.7.4.post1 --no-build-isolation

Jun 19 '25 07:06 rin2401

pip install flash_attn==2.7.4.post1

This downgrade flash_attn from 2.8.1.post2 to 2.7.4.post1 works for my ubuntu20.04, whose GLIBC is 2.31

Jun 23 '25 08:06 ZetaJ7

Installing 2.7.4.post1 worked for me. Also, the cluster environment that I am working on has gcc/8.5.0 as default, I needed to load 9.5 or something higher everytime to make it work.

Jun 23 '25 14:06 ali-kerem

pip install flash_attn==2.7.4.post1

This downgrade flash_attn from 2.8.1.post2 to 2.7.4.post1 works for my ubuntu20.04, whose GLIBC is 2.31

In this case, what's your pytorch version?

Jun 24 '25 01:06 dogeeelin

Installing 2.7.4.post1 worked for me. Also, the cluster environment that I am working on has gcc/8.5.0 as default, I needed to load 9.5 or something higher everytime to make it work.

In this case, what's your pytorch version?

Jun 24 '25 01:06 dogeeelin

pip install flash_attn==2.7.4.post1 This downgrade flash_attn from 2.8.1.post2 to 2.7.4.post1 works for my ubuntu20.04, whose GLIBC is 2.31

In this case, what's your pytorch version?

torch 2.4.0 with CUDA 12.1

Jun 24 '25 02:06 ZetaJ7

Installing 2.7.4.post1 worked for me. Also, the cluster environment that I am working on has gcc/8.5.0 as default, I needed to load 9.5 or something higher everytime to make it work.

In this case, what's your pytorch version?

2.7.1 with CUDA 12.6

Jun 24 '25 08:06 ali-kerem

I had some issues installing flash attention directly. My setup is torch 2.7 with cuda 12.6. The following approach worked:

https://github.com/Dao-AILab/flash-attention/issues/1644#issuecomment-2899396361

Jun 28 '25 11:06 safal312

pip install flash_attn==2.7.4.post1

This downgrade flash_attn from 2.8.1.post2 to 2.7.4.post1 works for my ubuntu20.04, whose GLIBC is 2.31

This works for me. My setup is torch 2.5 with CUDA 12.1.

Jul 04 '25 09:07 journey1234-liu

pip install flash_attn==2.7.4.post1

you're the hero

Jul 05 '25 19:07 jvonrad

Same error for me, I solved this problem faster using pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiTRUE-cp39-cp39-linux_x86_64.whl.

You can select the whl file which matches with your environment.

pip install flash_attn==2.7.4.post1

This downgrade flash_attn from 2.8.1.post2 to 2.7.4.post1 works for my ubuntu20.04, whose GLIBC is 2.31

Jul 06 '25 11:07 kmuhan

flash_attn == 2.7.4.post1

It works for me.

Jul 09 '25 14:07 hairuoliu1

With all the "high fives" and workaround confirmations, I just wanted to highlight that this problem is still very much alive. The workaround is great if you are on 20.04, but if you are running a Blackwell card with Linux you are forced to use at least PyTorch 2.7 and CUDA 12.8, which is still an issue.

For me, I was also had to use Ubuntu 25.04 to even get my 5070 Ti card to be recognized by the OS, which complicated the GCC/G++ issue. My workaround is posted above, but I am limited to Python 13, which presents problems for some software packages.

@tridao's comment appears to be the correct fix, but it there doesn't seem to be much progress on it.

Jul 11 '25 22:07 bradison1

@bradison1 The best approach seems to be compiling the wheels on a manylinux + CUDA 12.9 environment to support PyTorch 2.7.x and Blackwell GPUs. However, I’m not sure whether GitHub Actions supports this setup.

Jul 12 '25 01:07 zipzou

Thanks to everyone who looked into this!

Jul 14 '25 18:07 cognivore

@dogeeelin Currently, the released wheel is built upon ubuntu 22.04, which has higher version of glibc. Before compiling env solved, you can build the wheels from sources or try to use another released wheel. If used torch>=2.7, we have to compile it by ourselves.

Thank you, this worked when i downgrade to flash-attn=2.7.4.post1 complied with torch <= 2.6 and ubuntu 20.04 https://github.com/Dao-AILab/flash-attention/releases/tag/v2.7.4.post1

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126 pip install flash-attn==2.7.4.post1 --no-build-isolation

Tried this but got the following error when import flash_attn:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/XXX/miniconda3/envs/RRC_DocVQA/lib/python3.10/site-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/XXX/miniconda3/envs/RRC_DocVQA/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 15, in <module>
    import flash_attn_2_cuda as flash_attn_gpu
ModuleNotFoundError: No module named 'flash_attn_2_cuda'

Have anyone faced the same issue?

Jul 23 '25 15:07 CQHofsns

Right, we'd appreciate help with configuring the github runner to use manylinux https://github.com/Dao-AILab/flash-attention/blob/main/.github/workflows/publish.yml

@tridao I am currently conducting compilation testing on the manylinux_2_28 platform and have made substantial progress. Upon successful validation of the compilation artifacts, I would submit a pull request to the current project. This approach is expected to provide a fundamental solution to the GLIBC compatibility issues.

Jul 26 '25 16:07 zipzou

@dogeeelin Currently, the released wheel is built upon ubuntu 22.04, which has higher version of glibc. Before compiling env solved, you can build the wheels from sources or try to use another released wheel. If used torch>=2.7, we have to compile it by ourselves.

Thank you, this worked when i downgrade to flash-attn=2.7.4.post1 complied with torch <= 2.6 and ubuntu 20.04 https://github.com/Dao-AILab/flash-attention/releases/tag/v2.7.4.post1

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126 pip install flash-attn==2.7.4.post1 --no-build-isolation

解决啦

Aug 05 '25 08:08 Ethan16162

@zipzou That's great news! Looking forward to your fix.

Aug 06 '25 03:08 bradison1