how to install SageAttention and FlashAttention in forge?
First of all, I apologize for my English, I'm not a native speaker and I'm still learning. I'm trying to install and enable SageAttention and FlashAttention in Forge, but without success. I would like to know if it is possible to install and enable SageAttention and FlashAttention in Forge. Thank you for your help.
SageAttention Wheels: https://github.com/woct0rdho/SageAttention/releases/tag/v2.1.1-windows FlashAttention Wheels: https://github.com/kingbri1/flash-attention/releases/tag/v2.7.4.post1
Make sure to use the same torch/CUDA versions of the wheels as your Forge install. Now, Forge itself doesn't support SageAttn/FlashAttn. However, there is a PR open that adds the functionality.
To grab it, open a terminal/cmd/powershell in your forge directory, and try running this command:
git pull origin pull/2815/head
Make sure to add either --use-flash-attention or --use-sage-attention to your webui-user file's COMMANDLINE_ARGS
Hi my friend, thank you very much for trying to help, but unfortunately I couldn't, now I have an error and it won't start, no problem, just do a new installation. I would like to know if you could tell me if you have already managed to do it and if so how you did it, I was trying to install it in the C:\stable-diffusion-webui-forge\system\python folder My version of Python is 3.10.6, pytorch 2.6.0 and CUDA cu126. If you can help I would be grateful, but if you can't I understand perfectly. All the best always
Please share the error if it happens again, that will help us figure out what to do :)
You will want to uninstall torch 2.6 and use torch 2.7, and since you can use CUDA 12.6 you may as well use CUDA 12.8. Also, there are not any flash attention wheels for CUDA 12.6. Here's a clear installation guide that assumes you have already downloaded the sage attention and flash attention whl files:
Installing Wheel Files
- Navigate to the embedded Python directory:
cd <drive and root>/stable-diffusion-web-ui-forge/system/python
Or open the terminal there directly
- Install the wheel files using pip:
./python.exe -m pip install path/to/flash_attn-2.7.4.post1+cu128torch2.7.0cxx11abiFALSE-cp310-cp310-win_amd64.whl ./python.exe -m pip install path/to/sageattention-2.1.1+cu128torch2.7.0-cp310-cp310-win_amd64.whl
⚠ Please Make sure the Torch, Cuda, and Python (cp310) versions on the file names are correct, as well as the path to the files ⚠
Reinstalling PyTorch
-
Uninstall current PyTorch packages:
./python.exe -m pip uninstall -y torch torchvision torchaudio -
Reinstall PyTorch packages with CUDA 12.8 support:
./python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 -
Verify the installation:
./python.exe -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"
Oh, thank you very much, my friend. I'll try it when I get home from work. Sorry, one more question. Do I need to install Triton? If so, how should I proceed? I'm going to do a clean install so I don't suffer from interference. Thank you very much, you were the only one who was really willing to help.
I downloaded everything my friend. I'm waiting for you :)
Triton is optional, but easy to install. Just go to the python directory and run this command:
python.exe -m pip install triton-windows
This command assumes you're on torch 2.7/cuda 12.8.
EDIT: I see you have the Triton wheel already downloaded. You can also install it directly same as sage attention and flash attention.
I'll start trying \0/
https://drive.google.com/file/d/1oPUwLdR0t1qpMiEwX-MbU9A2aJG9HQ00/view?usp=sharing
not work
git pull origin pull/2815/head
Just to be clear, it's a clean installation, I downloaded the Forge repository and haven't run anything yet, I just did the installations
Ah i see, its related to the git pull. Your python is fine - the git pull is something specific to the forge code. Run the command from the directory where all the forge assets are, such as webui-user.bat.
Everything went well, thank you very much my friend. I will later do a step-by-step guide to help future people.Thank you very much indeed, may God bless you always. Hugs from Brazil :)
Glad you got it working! :)
I recommend performing the procedure on a clean installation to avoid incompatibilities from other changes. The entire process was done on Windows 11.
1 - Reinstalling PyTorch
Navigate to your Forge installation folder and go to the Python directory.
Example:
C:\webui_forge_cu121_torch231\system\python
In the address bar, type cmd and press ENTER.
Now with the command prompt open, follow the steps below in order:
Upgrade pip: python.exe -m pip install --upgrade pip
Install tqdm: python.exe -m pip install tqdm
Update dependencies: pip install basicsr clean-fid --upgrade --force-reinstall
Uninstall current PyTorch packages: python.exe -m pip uninstall -y torch torchvision torchaudio
Reinstall PyTorch 2.7 packages with CUDA 12.8 support: python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
Verify the installation: python.exe -c "import torch; print(f'PyTorch version: {torch.version}'); print(f'CUDA available: {torch.cuda.is_available()}')"
2 - Downloading Triton, SageAttention, and FlashAttention files
Triton Wheels:
https://pypi.org/project/triton-windows/3.3.0.post19/#files
(Download the file named: triton_windows-3.3.0.post19-cp310-cp310-win_amd64.whl)
SageAttention Wheels:
https://github.com/woct0rdho/SageAttention/releases/tag/v2.1.1-windows
(Download the file named: sageattention-2.1.1+cu128torch2.7.0-cp310-cp310-win_amd64.whl)
FlashAttention Wheels:
https://github.com/kingbri1/flash-attention/releases/tag/v2.7.4.post1
(Download the file named: flash_attn-2.7.4.post1+cu128torch2.7.0cxx11abiFALSE-cp310-cp310-win_amd64.whl)
Download Python310includes:
https://huggingface.co/kim512/flash_attn-2.7.4.post1/blob/main/Python310includes.zip
Move the downloaded files to the root folder of your Forge installation for easier access when installing.
Example:
C:\webui_forge_cu121_torch231
3 - Extracting Python310includes
Navigate to your Forge installation folder and go to the Python directory.
Example:
C:\webui_forge_cu121_torch231\system\python
Extract the contents of Python310includes.zip into the python folder.
4 - Installing Triton, SageAttention, FlashAttention, and Xformers
Navigate to your Forge installation folder and go to the Python directory.
Example:
C:\webui_forge_cu121_torch231\system\python
In the address bar, type cmd and press ENTER.
Now with the command prompt open, follow the steps below in order:
Install Triton: python.exe -m pip install Example: python.exe -m pip install C:\webui_forge_cu121_torch231\triton_windows-3.3.0.post19-cp310-cp310-win_amd64.whl
Install SageAttention: python.exe -m pip install Example: python.exe -m pip install C:\webui_forge_cu121_torch231\sageattention-2.1.1+cu128torch2.7.0-cp310-cp310-win_amd64.whl
Install FlashAttention: python.exe -m pip install Example: python.exe -m pip install C:\webui_forge_cu121_torch231\flash_attn-2.7.4.post1+cu128torch2.7.0cxx11abiFALSE-cp310-cp310-win_amd64.whl
Install Xformers: python.exe -m pip install xformers
5 - Verifying if everything is installed
Navigate to your Forge installation folder and go to the Python directory.
Example:
C:\webui_forge_cu121_torch231\system\python
In the address bar, type cmd and press ENTER.
Type the following command: python.exe -m pip list
Check the list to ensure everything is installed.
6 - Enabling SageAttention and FlashAttention
Navigate to the Forge installation folder and go to the webui folder:
Example:
C:\webui_forge_cu121_torch231\webui
In the address bar, type cmd and press ENTER.
With the command prompt open, type:
git pull origin pull/2815/head
After the files are downloaded, close the command prompt window. Still in the webui folder, locate the file webui-user.bat, right-click it and choose Edit, open it in Notepad.
In the field set COMMANDLINE_ARGS=, you can choose SageAttention using --use-sage-attention or FlashAttention using --use-flash-attention.
Examples:
set COMMANDLINE_ARGS=--use-sage-attention
or
set COMMANDLINE_ARGS=--use-flash-attention
To use Xformers, no changes are necessary.
Video tutorial
https://youtu.be/rGgB_6i5IIQ
i installed torch 2.7 with cuda128, triton and sage attention but got this issue when using --use-sage-attention, any clue what is the issue ?
error: unrecognized arguments: --use-sage-attention
sorry, i missed the git pull
(venv) D:\IA\stable-diffusion-webui-forge>git pull origin pull/2815/head remote: Enumerating objects: 80, done. remote: Counting objects: 100% (46/46), done. remote: Compressing objects: 100% (20/20), done. remote: Total 80 (delta 41), reused 26 (delta 26), pack-reused 34 (from 3) Unpacking objects: 100% (80/80), 50.05 KiB | 249.00 KiB/s, done. From https://github.com/lllyasviel/stable-diffusion-webui-forge
- branch refs/pull/2815/head -> FETCH_HEAD Updating 17a42e58..ec93aabe Fast-forward backend/args.py | 2 + backend/attention.py | 1120 +++++++++++--------- backend/memory_management.py | 5 + .../scripts/preprocessor_inpaint.py | 44 + modules/shared_gradio_themes.py | 17 + 5 files changed, 687 insertions(+), 501 deletions(-)
now it is good. i see "Using sage attention" during startup.
@pauloatx Thank you so very much for your very detailed, step by step writeup! It was very useful
@nlienard Sorry for the delay my friend, I saw that you have the virtual environment active (venv).The installation must be done without enabling the virtual environment. Please follow the written tutorial and the video tutorial. Although it is in Portuguese, it has very clear steps.
@Fylifa thanks my friend
Managed to get sage attention to load correctly (cuda 12.8 and torch 2.8.0) but I am not seeing any speed improvement with flux... am I missing something? despite it says "Using sage attention" when launching forge generation time is exactly the same as before,
Try flash attention and follow my video where teacache is also added, I hope it helps