flash-attention How can I install with cuda12.1?

How can I install with cuda12.1?

Open tian969 opened this issue 1 year ago • 2 comments

trafficstars

I want to install Flash Attention. Both the bare-metal and the Docker container have CUDA version 12.1. The bare-metal machine can access the internet, but the Docker container can only connect to the internal network. I want to directly download the corresponding version of the WHL file so that it can be installed directly within the Docker container. Is there a way to achieve this?

Aug 09 '24 15:08 tian969

Yea you can just download the wheel compiled with cuda 12.3. Should be compatible.

Aug 09 '24 16:08 tridao

where are the wheels? compatible with cuda12.1 and torch2.0.1 on python3.8

Sep 11 '24 12:09 JohnHerry

where are the wheels? compatible with cuda12.1 and torch2.0.1 on python3.8

see the released whl file, i tried and found it was compatible.

Aug 12 '25 09:08 ZiangWu-77

flash-attention flash-attention copied to clipboard

How can I install with cuda12.1?

flash-attention
flash-attention copied to clipboard