flash-attention Why does `nvidia-cuda-runtime-cu12` not work and must have `/usr/local/cuda` version greater than 11.6

trafficstars

Fail to install this package with error message below:

  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [12 lines of output]
      fatal: not a git repository (or any of the parent directories): .git
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-316ekinv/flash-attn_8533b39ea95943b09a2457b1e0020eec/setup.py", line 115, in <module>
          raise RuntimeError(
      RuntimeError: FlashAttention is only supported on CUDA 11.6 and above.  Note: make sure nvcc has a supported version by running nvcc -V.
      
      
      torch.__version__  = 2.2.2+cu121
      
      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

My /usr/local/cuda version is definitely less than 11.6, but from my point of view, /usr/local/cuda is a cuda runtime, python package 'nvidia-cuda-runtime-cu12' form nvidia is also a cuda runtime, why does nvidia-cuda-runtime-cu12 not work and must have /usr/local/cuda version greater than 11.6

Similar problems https://github.com/Dao-AILab/flash-attention/issues/842 https://github.com/Dao-AILab/flash-attention/issues/825 https://github.com/Dao-AILab/flash-attention/issues/557

Apr 10 '24 08:04 lvzii

make sure nvcc has a supported version by running nvcc -V

Apr 10 '24 08:04 tridao

Thank you for your answer, but what I mean is: Apart from the installation check for the CUDA version, is the nvcc utilized elsewhere as well? Packages like torch don't check cuda runtime versions through nvcc.

Apr 10 '24 09:04 lvzii

I came across this too because of having an older full toolkit that included nvcc installed on my system while my conda environment had version 12.1 of the cuda runtime. I ended up just removing both cuda installations completely and installing the full toolkit from here which includes nvcc https://developer.nvidia.com/cuda-12-1-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0 .

nvcc is used to compile when there isn't an available wheel or if the user chooses to build from source. I am not sure if there is another reason for it even when the wheel a matching wheel exists. If there isn't, then it would make sense to have that check moved to just prior to compiling. A lot of people install Cuda when they install Pytorch and I think when done that way, you just get the cuda runtime and not nvcc.

Apr 24 '24 07:04 c1505

flash-attention flash-attention copied to clipboard

Why does `nvidia-cuda-runtime-cu12` not work and must have `/usr/local/cuda` version greater than 11.6

flash-attention
flash-attention copied to clipboard