flash-attention
flash-attention copied to clipboard
Why does `nvidia-cuda-runtime-cu12` not work and must have `/usr/local/cuda` version greater than 11.6
Fail to install this package with error message below:
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [12 lines of output]
fatal: not a git repository (or any of the parent directories): .git
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-316ekinv/flash-attn_8533b39ea95943b09a2457b1e0020eec/setup.py", line 115, in <module>
raise RuntimeError(
RuntimeError: FlashAttention is only supported on CUDA 11.6 and above. Note: make sure nvcc has a supported version by running nvcc -V.
torch.__version__ = 2.2.2+cu121
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
My /usr/local/cuda version is definitely less than 11.6,
but from my point of view, /usr/local/cuda
is a cuda runtime, python package 'nvidia-cuda-runtime-cu12' form nvidia
is also a cuda runtime, why does nvidia-cuda-runtime-cu12
not work and must have /usr/local/cuda
version greater than 11.6
Similar problems https://github.com/Dao-AILab/flash-attention/issues/842 https://github.com/Dao-AILab/flash-attention/issues/825 https://github.com/Dao-AILab/flash-attention/issues/557
make sure nvcc has a supported version by running nvcc -V
Thank you for your answer, but what I mean is:
Apart from the installation check for the CUDA version, is the nvcc
utilized elsewhere as well?
Packages like torch don't check cuda runtime versions through nvcc
.
I came across this too because of having an older full toolkit that included nvcc installed on my system while my conda environment had version 12.1 of the cuda runtime. I ended up just removing both cuda installations completely and installing the full toolkit from here which includes nvcc https://developer.nvidia.com/cuda-12-1-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0 .
nvcc is used to compile when there isn't an available wheel or if the user chooses to build from source. I am not sure if there is another reason for it even when the wheel a matching wheel exists. If there isn't, then it would make sense to have that check moved to just prior to compiling. A lot of people install Cuda when they install Pytorch and I think when done that way, you just get the cuda runtime and not nvcc.