stable-diffusion-webui-forge Runtime Error

I am getting this error. I have installed triton but still the same

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Version: f2.0.1v1.10.1-previous-659-gc055f2d4 Commit hash: c055f2d43b07cbfd87ac3da4899a6d7ee52ebab9 Installing requirements loading WD14-tagger reqs from L:\webui_forge_cu121_torch231\webui\extensions\stable-diffusion-webui-wd14-tagger\requirements.txt Checking WD14-tagger requirements. Launching Web UI with arguments: --xformers --cuda-malloc --opt-sdp-no-mem-attention --medvram Arg --medvram is removed in Forge. Now memory management is fully automatic and you do not need any command flags. Please just remove this flag. In extreme cases, if you want to force previous lowvram/medvram behaviors, please use --always-offload-from-vram Using cudaMallocAsync backend. Total VRAM 8188 MB, total RAM 32472 MB pytorch version: 2.3.1+cu121 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 4070 Laptop GPU : cudaMallocAsync VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16 CUDA Using Stream: False L:\webui_forge_cu121_torch231\system\python\lib\site-packages\triton\windows_utils.py:315: UserWarning: Failed to find Python libs. warnings.warn("Failed to find Python libs.") C:/Users/BULUT~1.HAR/AppData/Local/Temp/tmpteqycs5k/cuda_utils.c:14: error: include file 'Python.h' not found Failed to compile. cc_cmd: ['L:\\webui_forge_cu121_torch231\\system\\python\\Lib\\site-packages\\triton\\runtime\\tcc\\tcc.exe', 'C:\\Users\\BULUT~1.HAR\\AppData\\Local\\Temp\\tmpteqycs5k\\cuda_utils.c', '-O3', '-shared', '-Wno-psabi', '-o', 'C:\\Users\\BULUT~1.HAR\\AppData\\Local\\Temp\\tmpteqycs5k\\cuda_utils.cp310-win_amd64.pyd', '-fPIC', '-lcuda', '-lpython3', '-LL:\\webui_forge_cu121_torch231\\system\\python\\Lib\\site-packages\\triton\\backends\\nvidia\\lib', '-LC:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.6\\lib\\x64', '-IL:\\webui_forge_cu121_torch231\\system\\python\\Lib\\site-packages\\triton\\backends\\nvidia\\include', '-IC:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.6\\include', '-IC:\\Users\\BULUT~1.HAR\\AppData\\Local\\Temp\\tmpteqycs5k', '-IL:\\webui_forge_cu121_torch231\\system\\python\\Include'] C:/Users/BULUT~1.HAR/AppData/Local/Temp/tmpopo1fhjl/cuda_utils.c:14: error: include file 'Python.h' not found Failed to compile. cc_cmd: ['L:\\webui_forge_cu121_torch231\\system\\python\\Lib\\site-packages\\triton\\runtime\\tcc\\tcc.exe', 'C:\\Users\\BULUT~1.HAR\\AppData\\Local\\Temp\\tmpopo1fhjl\\cuda_utils.c', '-O3', '-shared', '-Wno-psabi', '-o', 'C:\\Users\\BULUT~1.HAR\\AppData\\Local\\Temp\\tmpopo1fhjl\\cuda_utils.cp310-win_amd64.pyd', '-fPIC', '-lcuda', '-lpython3', '-LL:\\webui_forge_cu121_torch231\\system\\python\\Lib\\site-packages\\triton\\backends\\nvidia\\lib', '-LC:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.6\\lib\\x64', '-IL:\\webui_forge_cu121_torch231\\system\\python\\Lib\\site-packages\\triton\\backends\\nvidia\\include', '-IC:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.6\\include', '-IC:\\Users\\BULUT~1.HAR\\AppData\\Local\\Temp\\tmpopo1fhjl', '-IL:\\webui_forge_cu121_torch231\\system\\python\\Include'] L:\webui_forge_cu121_torch231\system\python\lib\site-packages\transformers\utils\hub.py:128: FutureWarning: Using TRANSFORMERS_CACHEis deprecated and will be removed in v5 of Transformers. UseHF_HOME` instead. warnings.warn( C:/Users/BULUT~1.HAR/AppData/Local/Temp/tmpsvvei_go/cuda_utils.c:14: error: include file 'Python.h' not found Failed to compile. cc_cmd: ['L:\webui_forge_cu121_torch231\system\python\Lib\site-packages\triton\runtime\tcc\tcc.exe', 'C:\Users\BULUT~1.HAR\AppData\Local\Temp\tmpsvvei_go\cuda_utils.c', '-O3', '-shared', '-Wno-psabi', '-o', 'C:\Users\BULUT~1.HAR\AppData\Local\Temp\tmpsvvei_go\cuda_utils.cp310-win_amd64.pyd', '-fPIC', '-lcuda', '-lpython3', '-LL:\webui_forge_cu121_torch231\system\python\Lib\site-packages\triton\backends\nvidia\lib', '-LC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\lib\x64', '-IL:\webui_forge_cu121_torch231\system\python\Lib\site-packages\triton\backends\nvidia\include', '-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include', '-IC:\Users\BULUT~1.HAR\AppData\Local\Temp\tmpsvvei_go', '-IL:\webui_forge_cu121_torch231\system\python\Include'] Traceback (most recent call last): File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\utils\import_utils.py", line 853, in get_module return importlib.import_module("." + module_name, self.name) File "importlib_init.py", line 126, in import_module File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 992, in _find_and_load_unlocked File "", line 241, in _call_with_frames_removed File "", line 1050, in gcd_import File "", line 1027, in find_and_load File "", line 1006, in find_and_load_unlocked File "", line 688, in load_unlocked File "", line 883, in exec_module File "", line 241, in call_with_frames_removed File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\models\autoencoders_init.py", line 1, in from .autoencoder_asym_kl import AsymmetricAutoencoderKL File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\models\autoencoders\autoencoder_asym_kl.py", line 22, in from ..modeling_utils import ModelMixin File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\models\modeling_utils.py", line 35, in from ..quantizers import DiffusersAutoQuantizer, DiffusersQuantizer File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\quantizers_init.py", line 15, in from .auto import DiffusersAutoQuantizer File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\quantizers\auto.py", line 21, in from .bitsandbytes import BnB4BitDiffusersQuantizer, BnB8BitDiffusersQuantizer File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\quantizers\bitsandbytes_init.py", line 2, in from .utils import dequantize_and_replace, dequantize_bnb_weight, replace_with_bnb_linear File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\quantizers\bitsandbytes\utils.py", line 32, in import bitsandbytes as bnb File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\bitsandbytes_init.py", line 15, in from .nn import modules File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\bitsandbytes\nn_init.py", line 21, in from .triton_based_modules import ( File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\bitsandbytes\nn\triton_based_modules.py", line 6, in from bitsandbytes.triton.dequantize_rowwise import dequantize_rowwise File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\bitsandbytes\triton\dequantize_rowwise.py", line 36, in def _dequantize_rowwise( File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\triton\runtime\autotuner.py", line 378, in decorator return Autotuner(fn, fn.arg_names, configs, key, reset_to_zero, restore_value, pre_hook=pre_hook, File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\triton\runtime\autotuner.py", line 130, in init self.do_bench = driver.active.get_benchmarker() File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\triton\runtime\driver.py", line 23, in getattr self._initialize_obj() File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\triton\runtime\driver.py", line 20, in _initialize_obj self._obj = self._init_fn() File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\triton\runtime\driver.py", line 9, in _create_driver return actives0 File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\triton\backends\nvidia\driver.py", line 576, in init self.utils = CudaUtils() # TODO: make static File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\triton\backends\nvidia\driver.py", line 101, in init mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils") File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\triton\backends\nvidia\driver.py", line 74, in compile_module_from_src so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries) File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\triton\runtime\build.py", line 100, in _build raise e File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\triton\runtime\build.py", line 97, in _build ret = subprocess.check_call(cc_cmd) File "subprocess.py", line 369, in check_call subprocess.CalledProcessError: Command '['L:\webui_forge_cu121_torch231\system\python\Lib\site-packages\triton\runtime\tcc\tcc.exe', 'C:\Users\BULUT~1.HAR\AppData\Local\Temp\tmpsvvei_go\cuda_utils.c', '-O3', '-shared', '-Wno-psabi', '-o', 'C:\Users\BULUT~1.HAR\AppData\Local\Temp\tmpsvvei_go\cuda_utils.cp310-win_amd64.pyd', '-fPIC', '-lcuda', '-lpython3', '-LL:\webui_forge_cu121_torch231\system\python\Lib\site-packages\triton\backends\nvidia\lib', '-LC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\lib\x64', '-IL:\webui_forge_cu121_torch231\system\python\Lib\site-packages\triton\backends\nvidia\include', '-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include', '-IC:\Users\BULUT~1.HAR\AppData\Local\Temp\tmpsvvei_go', '-IL:\webui_forge_cu121_torch231\system\python\Include']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\utils\import_utils.py", line 853, in get_module return importlib.import_module("." + module_name, self.name) File "importlib_init.py", line 126, in import_module File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 44, in from ..models import AutoencoderKL File "", line 1075, in _handle_fromlist File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\utils\import_utils.py", line 843, in getattr module = self._get_module(self._class_to_module[name]) File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\utils\import_utils.py", line 855, in _get_module raise RuntimeError( RuntimeError: Failed to import diffusers.models.autoencoders.autoencoder_kl because of the following error (look up to see its traceback): Command '['L:\webui_forge_cu121_torch231\system\python\Lib\site-packages\triton\runtime\tcc\tcc.exe', 'C:\Users\BULUT~1.HAR\AppData\Local\Temp\tmpsvvei_go\cuda_utils.c', '-O3', '-shared', '-Wno-psabi', '-o', 'C:\Users\BULUT~1.HAR\AppData\Local\Temp\tmpsvvei_go\cuda_utils.cp310-win_amd64.pyd', '-fPIC', '-lcuda', '-lpython3', '-LL:\webui_forge_cu121_torch231\system\python\Lib\site-packages\triton\backends\nvidia\lib', '-LC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\lib\x64', '-IL:\webui_forge_cu121_torch231\system\python\Lib\site-packages\triton\backends\nvidia\include', '-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include', '-IC:\Users\BULUT~1.HAR\AppData\Local\Temp\tmpsvvei_go', '-IL:\webui_forge_cu121_torch231\system\python\Include']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "L:\webui_forge_cu121_torch231\webui\launch.py", line 54, in main() File "L:\webui_forge_cu121_torch231\webui\launch.py", line 50, in main start() File "L:\webui_forge_cu121_torch231\webui\modules\launch_utils.py", line 546, in start import webui File "L:\webui_forge_cu121_torch231\webui\webui.py", line 23, in initialize.imports() File "L:\webui_forge_cu121_torch231\webui\modules\initialize.py", line 32, in imports from modules import processing, gradio_extensions, ui # noqa: F401 File "L:\webui_forge_cu121_torch231\webui\modules\processing.py", line 19, in from modules import devices, prompt_parser, masking, sd_samplers, lowvram, infotext_utils, extra_networks, sd_vae_approx, scripts, sd_samplers_common, sd_unet, errors, rng, profiling File "L:\webui_forge_cu121_torch231\webui\modules\sd_samplers.py", line 5, in from modules import sd_samplers_kdiffusion, sd_samplers_timesteps, sd_samplers_lcm, shared, sd_samplers_common, sd_schedulers File "L:\webui_forge_cu121_torch231\webui\modules\sd_samplers_kdiffusion.py", line 5, in from modules import sd_samplers_common, sd_samplers_extra, sd_samplers_cfg_denoiser, sd_schedulers, devices File "L:\webui_forge_cu121_torch231\webui\modules\sd_samplers_common.py", line 6, in from modules import devices, images, sd_vae_approx, sd_samplers, sd_vae_taesd, shared, sd_models File "L:\webui_forge_cu121_torch231\webui\modules\sd_models.py", line 21, in from backend.loader import forge_loader File "L:\webui_forge_cu121_torch231\webui\backend\loader.py", line 9, in from diffusers import DiffusionPipeline File "", line 1075, in _handle_fromlist File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\utils\import_utils.py", line 844, in getattr value = getattr(module, name) File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\utils\import_utils.py", line 843, in getattr module = self._get_module(self._class_to_module[name]) File "L:\webui_forge_cu121_torch231\system\python\lib\site-packages\diffusers\utils\import_utils.py", line 855, in _get_module raise RuntimeError( RuntimeError: Failed to import diffusers.pipelines.pipeline_utils because of the following error (look up to see its traceback): Failed to import diffusers.models.autoencoders.autoencoder_kl because of the following error (look up to see its traceback): Command '['L:\webui_forge_cu121_torch231\system\python\Lib\site-packages\triton\runtime\tcc\tcc.exe', 'C:\Users\BULUT~1.HAR\AppData\Local\Temp\tmpsvvei_go\cuda_utils.c', '-O3', '-shared', '-Wno-psabi', '-o', 'C:\Users\BULUT~1.HAR\AppData\Local\Temp\tmpsvvei_go\cuda_utils.cp310-win_amd64.pyd', '-fPIC', '-lcuda', '-lpython3', '-LL:\webui_forge_cu121_torch231\system\python\Lib\site-packages\triton\backends\nvidia\lib', '-LC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\lib\x64', '-IL:\webui_forge_cu121_torch231\system\python\Lib\site-packages\triton\backends\nvidia\include', '-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include', '-IC:\Users\BULUT~1.HAR\AppData\Local\Temp\tmpsvvei_go', '-IL:\webui_forge_cu121_torch231\system\python\Include']' returned non-zero exit status 1. Press any key to continue . . .`

Apr 29 '25 18:04 bulutharbeli

Your webui/python install don't seem setup correctly. Did you setup with a VENV or are you using your global python? Going by the folder name you used the pre-made release. Did you update the WebUI? Does it work if you uninstall triton?

Apr 29 '25 23:04 MisterChief95

Your webui/python install don't seem setup correctly. Did you setup with a VENV or are you using your global python? Going by the folder name you used the pre-made release. Did you update the WebUI? Does it work if you uninstall triton?

Hello, thanks for your quick response. It's not a fresh installation; it was installed a long time ago. I made all updates and uninstalled Triton, but no luck, still the same.

Apr 30 '25 06:04 bulutharbeli

Have you always had a ~ in your windows username? That's the only thing that really jumps out at me and it could be interfering with some path resolution code.

You could also try running webui-user.bat from within the forge directory and letting it generate a venv for you.

Apr 30 '25 07:04 MisterChief95

Have you always had a ~ in your windows username? That's the only thing that really jumps out at me and it could be interfering with some path resolution code.

You could also try running webui-user.bat from within the forge directory and letting it generate a venv for you.

Actually, there is no such character in my computer name, it added it to the code itself.

As you said, I ran webui-user.bat from the source folder, but it gave an error again and the problem was not fixed.

Apr 30 '25 18:04 bulutharbeli

Its kind of odd that its trying to access the CUDA toolkit even after you also uninstalled triton. My best suggestion is to backup your models/extensions and reinstall Forge directly using git. Additionally, if you have a RTX 2000+ gpu, I would also just install pytorch 2.7+cu128 so you get pytorch attention + a lot of the xformers speed ups. There's not a huge reason to use Xformers + Triton on Forge anymore if your gpu supports the newer torch versions.

Additionally, for Forge, many of your commandline args are irrelevant now. The only one you have that really does anything is --cuda-malloc.

I'll include the instructions:

Backup models/extensions/image/etc
(Optional) Use latest version of Python 3.11
Delete the forge folder
Open a new powershell prompt in the directory where you want to reinstall forge (can usually be accessed via right-click menu in file explorer on Win 10/11. May default to command prompt, in which case just enter powershell to switch)
Run the command git clone https://github.com/lllyasviel/stable-diffusion-webui-forge.git
Edit webui-user.bat to include your wanted commandline args. In this case I suggest the following (assuming rtx 2000+ gpu):
```
set COMMANDLINE_ARGS=--cuda-malloc --cuda-stream --pin-shared-memory
```
Start webui-user.bat. It will create the venv in the current directory. Press Ctrl+C when you see "installing requirements" to stop the application

With your powershell open in the root directory of Forge (where webui-user is), enter these commands:

venv\Scripts\activate
python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"

This will install torch then print the torch version if successful

Re-add models/extensions
Start Forge by running webui-user.bat

If you hit any snags along the way we'll circle back and take another look.

May 02 '25 21:05 MisterChief95