Cudnn/hipblas in Auto1111 AMD Fork useable?
Is Zluda with cudnn and hipblas useable in Stable-diffusion-webui-amdgpu? If so, does it have any advantage? For me the speed was the same and I got some issues. Do I need to enable some specific setting or launch arg? I wanted to try out Flash Attention. But idk if Auto1111 supports it.
The Preperation: AMD RX 7900XTX AMD Adrenalin 25.3.1 HIP SDK 6.2.4 for Windows 10 ZLUDA 3.9.0 nightly build for rocm6
What I have done: Made a fresh install of Stable-diffusion-webui-amdgpu Downloaded the HIP-SDK Extension and dropped and replaced it into ROCm\6.2\
Downloaded the hipblaslt-rocmlibs-for-gfx1100-gfx1101-gfx1102-gfx1103-gfx1150-for.hip6.2.7z and had to create the hipblaslt folder in ROCm\6.2\bin\ there I dropped the library folder in. So the tensile files are in: ROCm\6.2\bin\hipblaslt\library (idk if thats correct or not)
Launch args: --use-zluda --update-check --skip-ort --update-all-extensions --models-dir "D:\Programme\AI-Zeug\stable-diffusion-webui-directml\models"
Added these two values to the webui-user.bat:
set ZLUDA_NIGHTLY=1
set DISABLE_ADDMM_CUDA_LT=1
Replaced cublas_64.dll, cusparse64_11.dll, cublasLt64_11.dll, cudnn64_9.dll, cudart64_110.dll, nvrtc64_112_0.dll in venv/lib/site-packages/torch/lib
The Image Gen: After a long compile, Image generation works. But it has the same speed of my normal no cudnn zluda version. Also im not able to upscale with hires fix. -> out of memory crash. Blackscreen, Driver Timeout.
Problem with Extensions: Installed Extensions: Adetailer, booru tag autocompletion, Tiled Diffusion with Tiled VAE. Adetailer and Tiled VAE dont work at all and the cmd shows this error: Full CMD Log after launching, genning one image, then only enabled Adetailer and try to gen again:
Already up to date.
venv "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.10.1-amd-25-g04bf93f1
Commit hash: 04bf93f1e8276526e695577df59fe37dd9bfaaee
ROCm: agents=['gfx1100', 'gfx1036']
ROCm: version=6.2, using agent gfx1100
ZLUDA support: experimental
ROCm hipBLASLt: arch=gfx1100 available=True
Using ZLUDA in D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\.zluda
No ROCm runtime is found, using ROCM_HOME='C:\Program Files\AMD\ROCm\6.2'
Skipping onnxruntime installation.
You are up to date with the most recent release.
Pulled changes for repository in 'D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\extensions\a1111-sd-webui-tagcomplete':
Already up to date.
Pulled changes for repository in 'D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\extensions\adetailer':
Already up to date.
Pulled changes for repository in 'D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\extensions\multidiffusion-upscaler-for-automatic1111':
Already up to date.
D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
rank_zero_deprecation(
Launching Web UI with arguments: --use-zluda --update-check --skip-ort --update-all-extensions --models-dir 'D:\Programme\AI-Zeug\stable-diffusion-webui-directml\models'
Tag Autocomplete: Could not locate model-keyword extension, Lora trigger word completion will be limited to those added through the extra networks menu.
[-] ADetailer initialized. version: 25.3.0, num models: 18
Loading weights [98a8837740] from D:\Programme\AI-Zeug\stable-diffusion-webui-directml\models\Stable-diffusion\SDXL\Illustrious\novaOrangeXL_v60.safetensors
Creating model from config: D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\repositories\generative-models\configs\inference\sd_xl_base.yaml
creating model quickly: OSError
Traceback (most recent call last):
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\utils\_http.py", line 409, in hf_raise_for_status
response.raise_for_status()
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\requests\models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/None/resolve/main/config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\utils\hub.py", line 342, in cached_file
resolved_file = hf_hub_download(
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 862, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 969, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 1486, in _raise_on_head_call_error
raise head_call_error
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 1376, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 1296, in get_hf_file_metadata
r = _request_wrapper(
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 280, in _request_wrapper
response = _request_wrapper(
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py", line 304, in _request_wrapper
hf_raise_for_status(response)
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\utils\_http.py", line 458, in hf_raise_for_status
raise _format(RepositoryNotFoundError, message, response) from e
huggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-67d604ea-42bd851f3343850e5ace5785;e3d223bc-1092-474b-bed4-2dac1f2191f6)
Repository Not Found for url: https://huggingface.co/None/resolve/main/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\webyo\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
self._bootstrap_inner()
File "C:\Users\webyo\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\webyo\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\modules\initialize.py", line 149, in load_model
shared.sd_model # noqa: B018
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\modules\shared_items.py", line 190, in sd_model
return modules.sd_models.model_data.get_sd_model()
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 693, in get_sd_model
load_model()
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 831, in load_model
sd_model = instantiate_from_config(sd_config.model, state_dict)
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 775, in instantiate_from_config
return constructor(**params)
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\models\diffusion.py", line 61, in __init__
self.conditioner = instantiate_from_config(
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\util.py", line 175, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\modules\encoders\modules.py", line 88, in __init__
embedder = instantiate_from_config(embconfig)
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\util.py", line 175, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\repositories\generative-models\sgm\modules\encoders\modules.py", line 361, in __init__
self.transformer = CLIPTextModel.from_pretrained(version)
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\modules\sd_disable_initialization.py", line 68, in CLIPTextModel_from_pretrained
res = self.CLIPTextModel_from_pretrained(None, *model_args, config=pretrained_model_name_or_path, state_dict={}, **kwargs)
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\modeling_utils.py", line 262, in _wrapper
return func(*args, **kwargs)
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\modeling_utils.py", line 3540, in from_pretrained
resolved_config_file = cached_file(
File "D:\Programme\AI-Zeug\SD-Zluda-Webui\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\utils\hub.py", line 365, in cached_file
raise EnvironmentError(
OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`
Failed to create model quickly; will retry using slow method.
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Startup time: 15.5s (prepare environment: 16.0s, initialize shared: 1.1s, other imports: 0.5s, load scripts: 1.2s, create ui: 0.8s, gradio launch: 1.4s).
Loading VAE weights specified in settings: D:\Programme\AI-Zeug\stable-diffusion-webui-directml\models\VAE\sdxl_vae_fp16.safetensors
Applying attention optimization: Doggettx... done.
Model loaded in 19.4s (load weights from disk: 0.4s, create model: 8.0s, apply weights to model: 8.9s, apply half(): 0.2s, load VAE: 0.6s, move model to device: 0.2s, load textual inversion embeddings: 0.2s, calculate empty prompt: 0.9s).
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:13<00:00, 2.21it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 30/30 [00:10<00:00, 2.85it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:10<00:00, 2.93it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 30/30 [00:09<00:00, 2.92it/s]
thread '<unnamed>' panicked at zluda_runtime\src\lib.rs:65:5:
not implemented
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at core\src\panicking.rs:223:5:
panic in a function that cannot unwind
stack backtrace:
0: 0x7fff97d578d1 - _cudaRegisterTexture
1: 0x7fff97d6486a - _cudaRegisterTexture
2: 0x7fff97d560b7 - _cudaRegisterTexture
3: 0x7fff97d57715 - _cudaRegisterTexture
4: 0x7fff97d58c55 - _cudaRegisterTexture
5: 0x7fff97d58a34 - _cudaRegisterTexture
6: 0x7fff97d592e3 - _cudaRegisterTexture
7: 0x7fff97d59132 - _cudaRegisterTexture
8: 0x7fff97d5801f - _cudaRegisterTexture
9: 0x7fff97d58d6e - _cudaRegisterTexture
10: 0x7fff97d6cf65 - _cudaRegisterTexture
11: 0x7fff97d6d013 - _cudaRegisterTexture
12: 0x7fff97d6d091 - _cudaRegisterTexture
13: 0x7fff97d52803 - _cudaPushCallConfiguration
14: 0x7fffcbde1030 - <unknown>
15: 0x7fffcbde4608 - is_exception_typeof
16: 0x7fffedf11c26 - RtlCaptureContext2
17: 0x7fff97d527ea - _cudaPushCallConfiguration
18: 0x7ffee0263f97 - vision::cuda_version
19: 0x7ffee0263bcb - vision::cuda_version
20: 0x7ffee026339a - vision::cuda_version
21: 0x7ffee0265ac8 - vision::cuda_version
22: 0x7ffee0265e1a - vision::cuda_version
23: 0x7ffee02653a3 - vision::cuda_version
24: 0x7ffee02652d4 - vision::cuda_version
25: 0x7ffee0265d4f - vision::cuda_version
26: 0x7ffdd4dfc3ac - c10::Dispatcher::callBoxed
27: 0x7fff70edd240 - torch::jit::invokeOperatorFromPython
28: 0x7fff70eda2c7 - torch::jit::_get_operation_for_overload_or_packet
29: 0x7fff70e42ca6 - registerPythonTensorClass
30: 0x7fff70dea4e6 - registerPythonTensorClass
31: 0x7fff7085140b - c10::ivalue::Future::devices
32: 0x7fff99949eea - PyObject_IsTrue
33: 0x7fff9998bdce - PyObject_Call
34: 0x7fff9998becb - PyObject_Call
35: 0x7fff999ab5e7 - PyEval_EvalFrameDefault
36: 0x7fff999a49d7 - PyFunction_Vectorcall
37: 0x7fff9995a8af - PyObject_FastCallDictTstate
38: 0x7fff99a681f4 - PyObject_Call_Prepend
39: 0x7fff99a68150 - PyBytesWriter_Resize
40: 0x7fff999a9892 - PyEval_EvalFrameDefault
41: 0x7fff999a49d7 - PyFunction_Vectorcall
42: 0x7fff999a7293 - PyEval_EvalFrameDefault
43: 0x7fff999a49d7 - PyFunction_Vectorcall
44: 0x7fff999abacd - PyEval_EvalFrameDefault
45: 0x7fff999a6a94 - PyEval_EvalFrameDefault
46: 0x7fff9994b58b - PyObject_GetDictPtr
47: 0x7fff999be037 - PyGen_Finalize
48: 0x7fff9996ff5b - PyMem_RawMalloc
49: 0x7fff999a6385 - PyEval_EvalFrameDefault
50: 0x7fff9994b58b - PyObject_GetDictPtr
51: 0x7fff9994b47a - PyObject_GetDictPtr
52: 0x7fff9994aa0f - PyObject_GetDictPtr
53: 0x7fff9994a7c4 - PyObject_GetDictPtr
54: 0x7fff999a6033 - PyEval_EvalFrameDefault
55: 0x7fff999a49d7 - PyFunction_Vectorcall
56: 0x7fff9995a917 - PyObject_FastCallDictTstate
57: 0x7fff99a681f4 - PyObject_Call_Prepend
58: 0x7fff99a68150 - PyBytesWriter_Resize
59: 0x7fff9998ffbb - PyObject_MakeTpCall
60: 0x7fff999ac39f - PyEval_EvalFrameDefault
61: 0x7fff999a3615 - PyObject_GC_Malloc
62: 0x7fff9998c00c - PyVectorcall_Call
63: 0x7fff9998be87 - PyObject_Call
64: 0x7fff999ab5e7 - PyEval_EvalFrameDefault
65: 0x7fff999a49d7 - PyFunction_Vectorcall
66: 0x7fff9995a917 - PyObject_FastCallDictTstate
67: 0x7fff99a681f4 - PyObject_Call_Prepend
68: 0x7fff99a68150 - PyBytesWriter_Resize
69: 0x7fff9998ffbb - PyObject_MakeTpCall
70: 0x7fff999ac39f - PyEval_EvalFrameDefault
71: 0x7fff999a49d7 - PyFunction_Vectorcall
72: 0x7fff999abacd - PyEval_EvalFrameDefault
73: 0x7fff999a8620 - PyEval_EvalFrameDefault
74: 0x7fff999a49d7 - PyFunction_Vectorcall
75: 0x7fff9998bfb0 - PyVectorcall_Call
76: 0x7fff9998be87 - PyObject_Call
77: 0x7fff999ab5e7 - PyEval_EvalFrameDefault
78: 0x7fff999a49d7 - PyFunction_Vectorcall
79: 0x7fff999a36f3 - PyObject_GC_Malloc
80: 0x7fff9998bfb0 - PyVectorcall_Call
81: 0x7fff9998be87 - PyObject_Call
82: 0x7fff999ab5e7 - PyEval_EvalFrameDefault
83: 0x7fff999a6a94 - PyEval_EvalFrameDefault
84: 0x7fff999a49d7 - PyFunction_Vectorcall
85: 0x7fff999a6033 - PyEval_EvalFrameDefault
86: 0x7fff999a49d7 - PyFunction_Vectorcall
87: 0x7fff999a7293 - PyEval_EvalFrameDefault
88: 0x7fff999a49d7 - PyFunction_Vectorcall
89: 0x7fff9998bfb0 - PyVectorcall_Call
90: 0x7fff9998be87 - PyObject_Call
91: 0x7fff999ab5e7 - PyEval_EvalFrameDefault
92: 0x7fff999a49d7 - PyFunction_Vectorcall
93: 0x7fff9998bfb0 - PyVectorcall_Call
94: 0x7fff9998be87 - PyObject_Call
95: 0x7fff999ab5e7 - PyEval_EvalFrameDefault
96: 0x7fff999a49d7 - PyFunction_Vectorcall
97: 0x7fff9998bfb0 - PyVectorcall_Call
98: 0x7fff9998be87 - PyObject_Call
99: 0x7fff999ab5e7 - PyEval_EvalFrameDefault
100: 0x7fff999a49d7 - PyFunction_Vectorcall
101: 0x7fff9998bfb0 - PyVectorcall_Call
102: 0x7fff9998be87 - PyObject_Call
103: 0x7fff999ab5e7 - PyEval_EvalFrameDefault
104: 0x7fff999a49d7 - PyFunction_Vectorcall
105: 0x7fff99b53c9d - PyContext_NewHamtForTests
106: 0x7fff99b53f79 - PyContext_NewHamtForTests
107: 0x7fff99964681 - PyArg_CheckPositional
108: 0x7fff9998bfb0 - PyVectorcall_Call
109: 0x7fff9998bd93 - PyObject_Call
110: 0x7fff9998becb - PyObject_Call
111: 0x7fff999ab5e7 - PyEval_EvalFrameDefault
112: 0x7fff999a6a94 - PyEval_EvalFrameDefault
113: 0x7fff999a6a94 - PyEval_EvalFrameDefault
114: 0x7fff999a49d7 - PyFunction_Vectorcall
115: 0x7fff999a3769 - PyObject_GC_Malloc
116: 0x7fff9998bfb0 - PyVectorcall_Call
117: 0x7fff9998bd93 - PyObject_Call
118: 0x7fff99a18962 - PyRuntimeState_Fini
119: 0x7fff99a188de - PyRuntimeState_Fini
120: 0x7fffeb971bb2 - configthreadlocale
121: 0x7fffebf87374 - BaseThreadInitThunk
122: 0x7fffedebcc91 - RtlUserThreadStart
thread caused non-unwinding panic. aborting.
Drücken Sie eine beliebige Taste . . .
cudart.dll is not necessary and incomplete. Simply exclude it.
hipBLASLt will be disabled by DISABLE_ADDMM_CUDA_LT. Unset or set =0 if you want to enable it. ROCm\6.2\bin\hipblaslt\library is fine if hipblaslt.dll is in 6.2/bin.
(note that typically hipBLASLt performs worse than rocBLAS)
cudnn requires dev build on A1111.
~~dev.zip~~ (now dev branch is merged so just use 3.9.1 nightly)
There is no speed gain by enabling flash attention, but there is big improvement for both speed and vram usage by enabling MIOpen Conv2d solver. It is enabled by default if you are using nightly build.
For 3.9.1 is MIOpen Conv2d solver enabled or do I have to download nightly build too?
cuDNN is not included in the automized GitHub Actions builds because MIOpen is unavailable on the official HIP SDK releases. So you have to use a nightly build.
Thanks for the reply. I upgraded my test webui to Zluda 3.9.1 Nightly. reset the venv and .zluda folder. Replaced the .zluda files and torch/lib with the renamed zluda files mentioned at the top. I also upgraded my normal webui to Zluda 3.9.1 for comparison.
The nightly is a bit faster but not by much. Vram usage is nearly the same.
Here are my results: Auto1111 Zluda 3.9.1: Illustrious model. sampler euler a, resolution: 832x1216 = 2.81 it/s (11.5s) 832x1216 + hires fix upscale by 1.5, 10 hires steps, upscaler: Resrgan4xAnime6b, Tiled VAE enabled = 26.5 seconds
Auto1111 Zluda 3.9.1 Nightly: Illustrious model. sampler euler a, resolution: 832x1216 = 2.97 it/s (10.4s) 832x1216 + hires fix upscale by 1.5, 10 hires steps, upscaler: Resrgan4xAnime6b, Tiled VAE enabled = 23.6 seconds
I big difference happens when i Latent upscale 832x1216 by 2x. (This needs Tiled VAE enabled to not get oom)
My normal webui spills for a short time into shared VRAM at the VAE Step but does produced an image.
This is the vram usage:
The zluda nightly webui does spill into shared vram too but stays there at the VAE Decode Process of Tiled VAE and Freezes the whole PC.
Maybe i missed something while setting it up?
cuDNN is not included in the automized GitHub Actions builds because MIOpen is unavailable on the official HIP SDK releases. So you have to use a nightly build.
Thanx, tryed CuDNN on 3.9.1 in ZLUDA-ComfyUI without success(
gfx1150, changed C:\Program Files\AMD\ROCm\6.2. Changed cublas64_11.dll ,cusparse64_11.dll, cublasLt64_11.dll, nvrtc64_112_0.dll, cudnn64_9.dll. set ZLUDA_NIGHTLY=1 set DISABLE_ADDMM_CUDA_LT=1. Removed torch.backends.cudnn.enabled = False. Launched with --force-fp32 and without - no changes. On first workflow run after compiation - an error:
`thread '<unnamed>' panicked at zluda_dnn\src\lib.rs:1365:14:
[ZLUDA] Unknown descriptor type: 12
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at library\core\src\panicking.rs:218:5:
panic in a function that cannot unwind
stack backtrace:
0: 0x7ffdfc5624c1 - cudnnConvolutionBackwardData
1: 0x7ffdfc56f5aa - cudnnConvolutionBackwardData
2: 0x7ffdfc5608a7 - cudnnConvolutionBackwardData
3: 0x7ffdfc562305 - cudnnConvolutionBackwardData
4: 0x7ffdfc56391f - cudnnConvolutionBackwardData
5: 0x7ffdfc563682 - cudnnConvolutionBackwardData
6: 0x7ffdfc56406f - cudnnConvolutionBackwardData
7: 0x7ffdfc563ec2 - cudnnConvolutionBackwardData
8: 0x7ffdfc562bff - cudnnConvolutionBackwardData
9: 0x7ffdfc563afe - cudnnConvolutionBackwardData
10: 0x7ffdfc5783c5 - cudnnConvolutionBackwardData
11: 0x7ffdfc578473 - cudnnConvolutionBackwardData
12: 0x7ffdfc578555 - cudnnConvolutionBackwardData
13: 0x7ffdfc55cb03 - cudnnBackendCreateDescriptor
14: 0x7ffe191b1030 - <unknown>
15: 0x7ffe191b4608 - is_exception_typeof
16: 0x7ffe214438c6 - RtlCaptureContext2
17: 0x7ffdfc55cae7 - cudnnBackendCreateDescriptor
18: 0x7ffc3d02e4b8 - at::native::cudnn_convolution_transpose
19: 0x7ffc3d01f4c3 - at::native::cudnn_convolution_transpose
20: 0x7ffc3d022d32 - at::native::cudnn_convolution_transpose
21: 0x7ffc3d02bfa7 - at::native::cudnn_convolution_transpose
22: 0x7ffc3d032237 - at::native::cudnn_convolution_transpose
23: 0x7ffc3d03031b - at::native::cudnn_convolution_transpose
24: 0x7ffc3cffd78b - at::native::cudnn_convolution_add_relu
25: 0x7ffc3cffef57 - at::native::cudnn_convolution_transpose
26: 0x7ffc3cffe70e - at::native::cudnn_convolution_transpose
27: 0x7ffc3ecaf1a4 - at::cuda::where_outf
28: 0x7ffc3ebd4bf3 - at::cuda::bucketize_outf
29: 0x7ffc8e08657c - at::TensorMaker::make_tensor
30: 0x7ffc8e160543 - at::_ops::cudnn_convolution_transpose::call
31: 0x7ffc8dab83a2 - at::native::_convolution
32: 0x7ffc8e9cd0dd - at::compositeexplicitautograd::view_copy_symint_outf
33: 0x7ffc8e99bcde - at::compositeexplicitautograd::bucketize_outf
34: 0x7ffc8e086214 - at::TensorMaker::make_tensor
35: 0x7ffc8e12df8e - at::_ops::_convolution::call
36: 0x7ffc8dab73bb - at::native::sym_size
37: 0x7ffc8dac376b - at::native::convolution
38: 0x7ffc8e9cefe3 - at::compositeexplicitautograd::view_copy_symint_outf
39: 0x7ffc8e99bdef - at::compositeexplicitautograd::bucketize_outf
40: 0x7ffc8e0860b0 - at::TensorMaker::make_tensor
41: 0x7ffc8e15c999 - at::_ops::convolution::call
42: 0x7ffc8dac3156 - at::native::conv_transpose2d_symint
43: 0x7ffc8eb6650b - at::compositeimplicitautograd::where
44: 0x7ffc8eb442f3 - at::compositeimplicitautograd::broadcast_to_symint
45: 0x7ffc8e085f99 - at::TensorMaker::make_tensor
46: 0x7ffc8e446980 - at::_ops::conv_transpose2d_input::call
47: 0x7ffc3a00b71f - THPPointer<_frame>::release
48: 0x7ffc3a065863 - THPPointer<_frame>::release
49: 0x7ffdebb49eea - PyObject_IsTrue
50: 0x7ffdebba9892 - PyEval_EvalFrameDefault
51: 0x7ffdebba49d7 - PyFunction_Vectorcall
52: 0x7ffdebba36f3 - PyObject_GC_Malloc
53: 0x7ffdebb8bfb0 - PyVectorcall_Call
54: 0x7ffdebb8be87 - PyObject_Call
55: 0x7ffdebbab5e7 - PyEval_EvalFrameDefault
56: 0x7ffdebba49d7 - PyFunction_Vectorcall
57: 0x7ffdebba36f3 - PyObject_GC_Malloc
58: 0x7ffdebb8bfb0 - PyVectorcall_Call
59: 0x7ffdebb8be87 - PyObject_Call
60: 0x7ffdebbab5e7 - PyEval_EvalFrameDefault
61: 0x7ffdebba49d7 - PyFunction_Vectorcall
62: 0x7ffdebb5a8af - PyObject_FastCallDictTstate
63: 0x7ffdebc681f4 - PyObject_Call_Prepend
64: 0x7ffdebc68150 - PyBytesWriter_Resize
65: 0x7ffdebbaa598 - PyEval_EvalFrameDefault
66: 0x7ffdebba49d7 - PyFunction_Vectorcall
67: 0x7ffdebba36f3 - PyObject_GC_Malloc
68: 0x7ffdebb8bfb0 - PyVectorcall_Call
69: 0x7ffdebb8be87 - PyObject_Call
70: 0x7ffdebbab5e7 - PyEval_EvalFrameDefault
71: 0x7ffdebba49d7 - PyFunction_Vectorcall
72: 0x7ffdebba36f3 - PyObject_GC_Malloc
73: 0x7ffdebb8bfb0 - PyVectorcall_Call
74: 0x7ffdebb8be87 - PyObject_Call
75: 0x7ffdebbab5e7 - PyEval_EvalFrameDefault
76: 0x7ffdebba49d7 - PyFunction_Vectorcall
77: 0x7ffdebb5a8af - PyObject_FastCallDictTstate
78: 0x7ffdebc681f4 - PyObject_Call_Prepend
79: 0x7ffdebc68150 - PyBytesWriter_Resize
80: 0x7ffdebba9892 - PyEval_EvalFrameDefault
81: 0x7ffdebba3615 - PyObject_GC_Malloc
82: 0x7ffdebba7293 - PyEval_EvalFrameDefault
83: 0x7ffdebba49d7 - PyFunction_Vectorcall
84: 0x7ffdebb8c00c - PyVectorcall_Call
85: 0x7ffdebb8be87 - PyObject_Call
86: 0x7ffdebbab5e7 - PyEval_EvalFrameDefault
87: 0x7ffdebba8620 - PyEval_EvalFrameDefault
88: 0x7ffdebba49d7 - PyFunction_Vectorcall
89: 0x7ffdebb5a917 - PyObject_FastCallDictTstate
90: 0x7ffdebc681f4 - PyObject_Call_Prepend
91: 0x7ffdebc68150 - PyBytesWriter_Resize
92: 0x7ffdebb8bf03 - PyObject_Call
93: 0x7ffdebbab5e7 - PyEval_EvalFrameDefault
94: 0x7ffdebba49d7 - PyFunction_Vectorcall
95: 0x7ffdebbabacd - PyEval_EvalFrameDefault
96: 0x7ffdebba3615 - PyObject_GC_Malloc
97: 0x7ffdebb8c00c - PyVectorcall_Call
98: 0x7ffdebb8be87 - PyObject_Call
99: 0x7ffdebbab5e7 - PyEval_EvalFrameDefault
100: 0x7ffdebba49d7 - PyFunction_Vectorcall
101: 0x7ffdebba6033 - PyEval_EvalFrameDefault
102: 0x7ffdebba49d7 - PyFunction_Vectorcall
103: 0x7ffdebbabacd - PyEval_EvalFrameDefault
104: 0x7ffdebba49d7 - PyFunction_Vectorcall
105: 0x7ffdebbabacd - PyEval_EvalFrameDefault
106: 0x7ffdebba49d7 - PyFunction_Vectorcall
107: 0x7ffdebba6033 - PyEval_EvalFrameDefault
108: 0x7ffdebba6a94 - PyEval_EvalFrameDefault
109: 0x7ffdebba49d7 - PyFunction_Vectorcall
110: 0x7ffdebb8bfb0 - PyVectorcall_Call
111: 0x7ffdebb8be87 - PyObject_Call
112: 0x7ffdebbab5e7 - PyEval_EvalFrameDefault
113: 0x7ffdebba6a94 - PyEval_EvalFrameDefault
114: 0x7ffdebba6a94 - PyEval_EvalFrameDefault
115: 0x7ffdebba49d7 - PyFunction_Vectorcall
116: 0x7ffdebba3769 - PyObject_GC_Malloc
117: 0x7ffdebb8bfb0 - PyVectorcall_Call
118: 0x7ffdebb8bd93 - PyObject_Call
119: 0x7ffdebc18962 - PyRuntimeState_Fini
120: 0x7ffdebc188de - PyRuntimeState_Fini
121: 0x7ffe1efc37b0 - wcsrchr
122: 0x7ffe1fc5e8d7 - BaseThreadInitThunk
123: 0x7ffe213dbf6c - RtlUserThreadStart
thread caused non-unwinding panic. aborting.`
without CuDNN it's really faster, much faster! Thanx!
may be I need not only remove torch.backends.cudnn.enabled = False but also enable torch.backends.cuda.enable_cudnn_sdp(True) ?