flash-attention
flash-attention copied to clipboard
flash attention is broken for cuda-12.x version
Despite using the nvidia-containers with cuda 12.4 and compiling from source, i still run into below error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/scratch.btaleka_gpu_1/code/flash-attention/flash_attn/__init__.py", line 3, in <module>
from flash_attn.flash_attn_interface import (
File "/home/scratch.btaleka_gpu_1/code/flash-attention/flash_attn/flash_attn_interface.py", line 10, in <module>
import flash_attn_2_cuda as flash_attn_cuda
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory
Flash attention should consider upgrading to latest container stack without explicit dependency on particular cuda runtime version. Such dependencies are fragile and often breaks the pipeline once someone tries to upgrade.
Fixing such as reported in https://github.com/Dao-AILab/flash-attention/issues/208 or https://github.com/Dao-AILab/flash-attention/issues/728 are not correct solutions especially when compilation from source fails.