flash-attention
flash-attention copied to clipboard
How can I install with cuda12.1?
I want to install Flash Attention. Both the bare-metal and the Docker container have CUDA version 12.1. The bare-metal machine can access the internet, but the Docker container can only connect to the internal network. I want to directly download the corresponding version of the WHL file so that it can be installed directly within the Docker container. Is there a way to achieve this?
Yea you can just download the wheel compiled with cuda 12.3. Should be compatible.
where are the wheels? compatible with cuda12.1 and torch2.0.1 on python3.8
where are the wheels? compatible with cuda12.1 and torch2.0.1 on python3.8
see the released whl file, i tried and found it was compatible.