flash-attention
flash-attention copied to clipboard
[bug] build is verrrrrrrrrrrrrrrrrrrry slow
I compiled with the latest source code, and the compilation was so slow that I had to fall back on commit 2.5.8. the previous version took me about 3-5 minutes to complete (70%CPU and 230GB memory usage), but this version barely sees the cpu working. what happened to him.
"MAX_JOBS" doesn't get the CPU excited either.
CentOS: 7.9.2009 Python: 3.10.14 GCC: 12.3.0 cmake: 3.27.9 nvcc: 12.2.140 wheel is OK
Did you install ninja? That sped things up for me considerably
Did you install ninja? That sped things up for me considerably
I have ninja 1.11.1.1, I think this may not be the cause of the problem because his compilation speed was good in the previous commit.
Following your suggestion, I attempted to install version 2.8.7 of flash-attention. However, the build process is still very slow, with CPU usage remaining below 1%. What could be causing thisοΌπ
pip install flash-attn==2.5.7 --no-build-isolation
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple, https://pypi.ngc.nvidia.com
Collecting flash-attn==2.5.7
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/21/cb/33a1f833ac4742c8adba063715bf769831f96d99dbbbb4be1b197b637872/flash_attn-2.5.7.tar.gz (2.5 MB)
ββββββββββββββββββ 2.5/2.5 MB 54.0 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from flash-attn==2.5.7) (2.3.0+cu118)
Collecting einops (from flash-attn==2.5.7)
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/44/5a/f0b9ad6c0a9017e62d4735daaeb11ba3b6c009d69a26141b258cd37b5588/einops-0.8.0-py3-none-any.whl (43 kB)
ββββββββββββββββ 43.2/43.2 kB 82.2 MB/s eta 0:00:00
Requirement already satisfied: packaging in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from flash-attn==2.5.7) (24.0)
Requirement already satisfied: ninja in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from flash-attn==2.5.7) (1.11.1.1)
Requirement already satisfied: filelock in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (3.14.0)
Requirement already satisfied: typing-extensions>=4.8.0 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (4.12.0)
Requirement already satisfied: sympy in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (1.12)
Requirement already satisfied: networkx in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (3.2.1)
Requirement already satisfied: jinja2 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (3.1.3)
Requirement already satisfied: fsspec in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (2024.5.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.8.89 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (11.8.89)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.8.89 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (11.8.89)
Requirement already satisfied: nvidia-cuda-cupti-cu11==11.8.87 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (11.8.87)
Requirement already satisfied: nvidia-cudnn-cu11==8.7.0.84 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (8.7.0.84)
Requirement already satisfied: nvidia-cublas-cu11==11.11.3.6 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (11.11.3.6)
Requirement already satisfied: nvidia-cufft-cu11==10.9.0.58 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (10.9.0.58)
Requirement already satisfied: nvidia-curand-cu11==10.3.0.86 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (10.3.0.86)
Requirement already satisfied: nvidia-cusolver-cu11==11.4.1.48 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (11.4.1.48)
Requirement already satisfied: nvidia-cusparse-cu11==11.7.5.86 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (11.7.5.86)
Requirement already satisfied: nvidia-nccl-cu11==2.20.5 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (2.20.5)
Requirement already satisfied: nvidia-nvtx-cu11==11.8.86 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (11.8.86)
Requirement already satisfied: triton==2.3.0 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from torch->flash-attn==2.5.7) (2.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from jinja2->torch->flash-attn==2.5.7) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in /zhuzixuan/conda/envs/bunny/lib/python3.10/site-packages (from sympy->torch->flash-attn==2.5.7) (1.3.0)
Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py) ... -
CPU: top - 08:39:29 up 61 days, 2:08, 1 user, load avera Tasks: 26 total, 1 running, 25 sleeping, 0 stopp %Cpu(s): 0.5 us, 0.1 sy, 0.0 ni, 99.4 id, 0.0 wa, top - 08:45:03 up 61 days, 2:14, 1 user, load avera Tasks: 24 total, 1 running, 23 sleeping, 0 stopp %Cpu(s): 1.1 us, 0.2 sy, 0.0 ni, 98.7 id, 0.0 wa, MiB Mem : 515612.6 total, 491855.4 free, 19313.1 used MiB Swap: 0.0 total, 0.0 free, 0.0 used
PID USER PR NI VIRT RES SHR S
386 root 20 0 1364196 498040 44128 S
668 root 20 0 1045352 74208 38872 S
1 root 20 0 8408 1756 1472 S
7 root 20 0 12272 5592 4560 S
21 root 20 0 18012 4776 3556 S
33 root 20 0 10212 1828 1544 S
382 root 20 0 9700 4360 4012 S
769 root 20 0 853504 53268 38888 S
14750 root 20 0 19496 10636 8960 S 14761 root 20 0 9896 4684 4292 S
try to clone the repo and run python setup.py install
instead. That's works for most of time. @CHDev93 @wongdi @HuBocheng @HuBocheng
Have you solved this problem yet? I have encountered the same problem.
My XPS 15 windows is taking hours to build...
I had the same issue. building flash-attn was slow and the CPU load was very low. Only 2 instances of the process "NVIDIA cicc" was running at the same time. running "pip install ninja" seems to help as suggested before. now I have 10 instances of Nvidia cicc running, and my CPU is at 37% (Ryzen 9 7950X 3D). I guess it will be 5 times quicker now. (windows 11 by the way)
The same issue, I couldn't build it all night, and now following the suggestion to revert to commit 2.5.8, the CPU was fully utilized, and the build succeeded.
Same issue here (32 RAM and i7 13th)
Same issue here(jetson agx orin pip install flash-attn==2.5.8 --no-build-isolation --verbose --no-cache-dir
)
Super slow build.
Same issue