stable-diffusion-webui-forge how to install SageAttention and FlashAttention in forge?

First of all, I apologize for my English, I'm not a native speaker and I'm still learning. I'm trying to install and enable SageAttention and FlashAttention in Forge, but without success. I would like to know if it is possible to install and enable SageAttention and FlashAttention in Forge. Thank you for your help.

May 09 '25 04:05 pauloatx

SageAttention Wheels: https://github.com/woct0rdho/SageAttention/releases/tag/v2.1.1-windows FlashAttention Wheels: https://github.com/kingbri1/flash-attention/releases/tag/v2.7.4.post1

Make sure to use the same torch/CUDA versions of the wheels as your Forge install. Now, Forge itself doesn't support SageAttn/FlashAttn. However, there is a PR open that adds the functionality.

To grab it, open a terminal/cmd/powershell in your forge directory, and try running this command:

git pull origin pull/2815/head

Make sure to add either --use-flash-attention or --use-sage-attention to your webui-user file's COMMANDLINE_ARGS

May 09 '25 07:05 MisterChief95

Hi my friend, thank you very much for trying to help, but unfortunately I couldn't, now I have an error and it won't start, no problem, just do a new installation. I would like to know if you could tell me if you have already managed to do it and if so how you did it, I was trying to install it in the C:\stable-diffusion-webui-forge\system\python folder My version of Python is 3.10.6, pytorch 2.6.0 and CUDA cu126. If you can help I would be grateful, but if you can't I understand perfectly. All the best always

May 09 '25 10:05 pauloatx

Please share the error if it happens again, that will help us figure out what to do :)

You will want to uninstall torch 2.6 and use torch 2.7, and since you can use CUDA 12.6 you may as well use CUDA 12.8. Also, there are not any flash attention wheels for CUDA 12.6. Here's a clear installation guide that assumes you have already downloaded the sage attention and flash attention whl files:

Installing Wheel Files

Navigate to the embedded Python directory:

cd <drive and root>/stable-diffusion-web-ui-forge/system/python

Or open the terminal there directly

Install the wheel files using pip:

./python.exe -m pip install path/to/flash_attn-2.7.4.post1+cu128torch2.7.0cxx11abiFALSE-cp310-cp310-win_amd64.whl
./python.exe -m pip install path/to/sageattention-2.1.1+cu128torch2.7.0-cp310-cp310-win_amd64.whl

⚠ Please Make sure the Torch, Cuda, and Python (cp310) versions on the file names are correct, as well as the path to the files ⚠

Reinstalling PyTorch

Uninstall current PyTorch packages:

./python.exe -m pip uninstall -y torch torchvision torchaudio

Reinstall PyTorch packages with CUDA 12.8 support:

./python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

Verify the installation:

./python.exe -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"

May 09 '25 11:05 MisterChief95

Oh, thank you very much, my friend. I'll try it when I get home from work. Sorry, one more question. Do I need to install Triton? If so, how should I proceed? I'm going to do a clean install so I don't suffer from interference. Thank you very much, you were the only one who was really willing to help.

May 09 '25 16:05 pauloatx

I downloaded everything my friend. I'm waiting for you :)

May 10 '25 18:05 pauloatx

Triton is optional, but easy to install. Just go to the python directory and run this command:

python.exe -m pip install triton-windows

This command assumes you're on torch 2.7/cuda 12.8.

EDIT: I see you have the Triton wheel already downloaded. You can also install it directly same as sage attention and flash attention.

May 10 '25 20:05 MisterChief95

I'll start trying \0/

May 11 '25 01:05 pauloatx

https://drive.google.com/file/d/1oPUwLdR0t1qpMiEwX-MbU9A2aJG9HQ00/view?usp=sharing

not work

git pull origin pull/2815/head

Just to be clear, it's a clean installation, I downloaded the Forge repository and haven't run anything yet, I just did the installations

May 11 '25 02:05 pauloatx

Ah i see, its related to the git pull. Your python is fine - the git pull is something specific to the forge code. Run the command from the directory where all the forge assets are, such as webui-user.bat.

May 11 '25 06:05 MisterChief95

Everything went well, thank you very much my friend. I will later do a step-by-step guide to help future people.Thank you very much indeed, may God bless you always. Hugs from Brazil :)

May 11 '25 18:05 pauloatx

Glad you got it working! :)

May 11 '25 23:05 MisterChief95

I recommend performing the procedure on a clean installation to avoid incompatibilities from other changes. The entire process was done on Windows 11.

1 - Reinstalling PyTorch

Navigate to your Forge installation folder and go to the Python directory.

Example:
C:\webui_forge_cu121_torch231\system\python

In the address bar, type cmd and press ENTER.

Now with the command prompt open, follow the steps below in order:

Upgrade pip: python.exe -m pip install --upgrade pip

Install tqdm: python.exe -m pip install tqdm

Update dependencies: pip install basicsr clean-fid --upgrade --force-reinstall

Uninstall current PyTorch packages: python.exe -m pip uninstall -y torch torchvision torchaudio

Reinstall PyTorch 2.7 packages with CUDA 12.8 support: python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

Verify the installation: python.exe -c "import torch; print(f'PyTorch version: {torch.version}'); print(f'CUDA available: {torch.cuda.is_available()}')"

2 - Downloading Triton, SageAttention, and FlashAttention files

Triton Wheels:
https://pypi.org/project/triton-windows/3.3.0.post19/#files
(Download the file named: triton_windows-3.3.0.post19-cp310-cp310-win_amd64.whl)

SageAttention Wheels:
https://github.com/woct0rdho/SageAttention/releases/tag/v2.1.1-windows
(Download the file named: sageattention-2.1.1+cu128torch2.7.0-cp310-cp310-win_amd64.whl)

FlashAttention Wheels:
https://github.com/kingbri1/flash-attention/releases/tag/v2.7.4.post1
(Download the file named: flash_attn-2.7.4.post1+cu128torch2.7.0cxx11abiFALSE-cp310-cp310-win_amd64.whl)

Download Python310includes:
https://huggingface.co/kim512/flash_attn-2.7.4.post1/blob/main/Python310includes.zip

Move the downloaded files to the root folder of your Forge installation for easier access when installing.

Example:
C:\webui_forge_cu121_torch231

3 - Extracting Python310includes

Navigate to your Forge installation folder and go to the Python directory.

Example:
C:\webui_forge_cu121_torch231\system\python

Extract the contents of Python310includes.zip into the python folder.

4 - Installing Triton, SageAttention, FlashAttention, and Xformers

Navigate to your Forge installation folder and go to the Python directory.

Example:
C:\webui_forge_cu121_torch231\system\python

In the address bar, type cmd and press ENTER.

Now with the command prompt open, follow the steps below in order:

Install Triton: python.exe -m pip install Example: python.exe -m pip install C:\webui_forge_cu121_torch231\triton_windows-3.3.0.post19-cp310-cp310-win_amd64.whl

Install SageAttention: python.exe -m pip install Example: python.exe -m pip install C:\webui_forge_cu121_torch231\sageattention-2.1.1+cu128torch2.7.0-cp310-cp310-win_amd64.whl

Install FlashAttention: python.exe -m pip install Example: python.exe -m pip install C:\webui_forge_cu121_torch231\flash_attn-2.7.4.post1+cu128torch2.7.0cxx11abiFALSE-cp310-cp310-win_amd64.whl

Install Xformers: python.exe -m pip install xformers

5 - Verifying if everything is installed

Navigate to your Forge installation folder and go to the Python directory.

Example:
C:\webui_forge_cu121_torch231\system\python

In the address bar, type cmd and press ENTER.

Type the following command: python.exe -m pip list

Check the list to ensure everything is installed.

6 - Enabling SageAttention and FlashAttention

Navigate to the Forge installation folder and go to the webui folder:

Example:
C:\webui_forge_cu121_torch231\webui

In the address bar, type cmd and press ENTER.

With the command prompt open, type:

git pull origin pull/2815/head

After the files are downloaded, close the command prompt window. Still in the webui folder, locate the file webui-user.bat, right-click it and choose Edit, open it in Notepad.

In the field set COMMANDLINE_ARGS=, you can choose SageAttention using --use-sage-attention or FlashAttention using --use-flash-attention.

Examples:

set COMMANDLINE_ARGS=--use-sage-attention

or

set COMMANDLINE_ARGS=--use-flash-attention

To use Xformers, no changes are necessary.

May 13 '25 20:05 pauloatx

Video tutorial

https://youtu.be/rGgB_6i5IIQ

May 16 '25 03:05 pauloatx

i installed torch 2.7 with cuda128, triton and sage attention but got this issue when using --use-sage-attention, any clue what is the issue ?

error: unrecognized arguments: --use-sage-attention

May 20 '25 06:05 nlienard

sorry, i missed the git pull

(venv) D:\IA\stable-diffusion-webui-forge>git pull origin pull/2815/head remote: Enumerating objects: 80, done. remote: Counting objects: 100% (46/46), done. remote: Compressing objects: 100% (20/20), done. remote: Total 80 (delta 41), reused 26 (delta 26), pack-reused 34 (from 3) Unpacking objects: 100% (80/80), 50.05 KiB | 249.00 KiB/s, done. From https://github.com/lllyasviel/stable-diffusion-webui-forge

branch refs/pull/2815/head -> FETCH_HEAD Updating 17a42e58..ec93aabe Fast-forward backend/args.py | 2 + backend/attention.py | 1120 +++++++++++--------- backend/memory_management.py | 5 + .../scripts/preprocessor_inpaint.py | 44 + modules/shared_gradio_themes.py | 17 + 5 files changed, 687 insertions(+), 501 deletions(-)

now it is good. i see "Using sage attention" during startup.

May 20 '25 06:05 nlienard

@pauloatx Thank you so very much for your very detailed, step by step writeup! It was very useful

May 20 '25 23:05 Fylifa

@nlienard Sorry for the delay my friend, I saw that you have the virtual environment active (venv).The installation must be done without enabling the virtual environment. Please follow the written tutorial and the video tutorial. Although it is in Portuguese, it has very clear steps.

May 21 '25 23:05 pauloatx

@Fylifa thanks my friend

May 21 '25 23:05 pauloatx

Managed to get sage attention to load correctly (cuda 12.8 and torch 2.8.0) but I am not seeing any speed improvement with flux... am I missing something? despite it says "Using sage attention" when launching forge generation time is exactly the same as before,

May 28 '25 19:05 pellaaa93

Try flash attention and follow my video where teacache is also added, I hope it helps

Jun 08 '25 04:06 pauloatx