stable-diffusion-webui-amdgpu icon indicating copy to clipboard operation
stable-diffusion-webui-amdgpu copied to clipboard

[Bug]:ZLUDA doesn't use AMD gpu, only runs on CPU. EDIT: speed comparison DirectML, ZLUDA, ROCK fot gfx1201

Open pptp78ec opened this issue 6 months ago • 10 comments

Checklist

  • [ ] The issue exists after disabling all extensions
  • [x] The issue exists on a clean installation of webui
  • [ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • [x] The issue exists in the current version of the webui
  • [ ] The issue has not been reported before recently
  • [ ] The issue has been reported before but has not been fixed yet

What happened?

ZLUDA version uses only CPU. While after patching ROCm to see and use gfx1201 (RX 9070) and setting HIP_VISIBLE_DEVICES=1 env, it still stubbornly runs on CPU only (R7 7700), both gfx1036 and gfx1201 are idle.

Steps to reproduce the problem

  1. Install clean stable-diffusion-webui-amdgpu
  2. Follow instructions from here: https://github.com/CS1o/Stable-Diffusion-Info/wiki/Webui-Installation-Guides#amd-automatic1111-with-zluda
  3. Patch ROCm 6.2.4 to use gfx1201 (RX 9070 and 9070XT), method here: https://github.com/IAHispano/Applio/issues/1005#issue-2936981353
  4. Launch webui-user.bat with COMMANDLINE_ARGS=--use-zluda --update-check --skip-ort --no-half
  5. Start generation. Using win task manager you can see that it uses CPU only.

What should have happened?

It should utilize gfx1201 (GPU RX 9070) in full

What browsers do you use to access the UI ?

Google Chrome

Sysinfo

sysinfo.json

Console logs

venv "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.10.1-amd-37-g721f6391
Commit hash: 721f6391993ac63fd246603735e2eb2e719ffac0
ROCm: agents=['gfx1201']
ROCm: version=6.2, using agent gfx1201
ZLUDA support: experimental
ZLUDA load: path='D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\.zluda' nightly=False
Skipping onnxruntime installation.
You are up to date with the most recent release.
D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\__init__.py:936: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\c10\cuda\CUDAFunctions.cpp:109.)
  r = torch._C._cuda_getDeviceCount() if nvml_count < 0 else nvml_count
D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments: --use-zluda --update-check --skip-ort --no-half
Warning: caught exception 'CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.', memory monitor disabled
Loading weights [6ce0161689] from D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors
Creating model from config: D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\configs\v1-inference.yaml
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 6.6s (prepare environment: 8.5s, initialize shared: 0.1s, other imports: 0.2s, load scripts: 0.3s, create ui: 0.5s, gradio launch: 0.1s).
Applying attention optimization: InvokeAI... done.
Model loaded in 1.6s (load weights from disk: 0.3s, create model: 0.6s, apply weights to model: 0.6s).
  4%|██▉                                                                                | 1/28 [00:30<13:41, 30.41s/it]
T

Additional information

No response

pptp78ec avatar Jun 19 '25 19:06 pptp78ec

Hey, first remove --no-half from the webui-user.bat Then delete the venv folder and the .zluda folder. Boath are in the stable-diffusion-webui folder. Also make sure your on AMD Adrenalin 25.3.1 or 25.4.1 because higher versions are not compatible with the shipped zluda of this webui and have to be replaced manually to make it work. Then relaunch the webui-user.bat

CS1o avatar Jun 21 '25 16:06 CS1o

Hey, first remove --no-half from the webui-user.bat Then delete the venv folder and the .zluda folder. Boath are in the stable-diffusion-webui folder. Also make sure your on AMD Adrenalin 25.3.1 or 25.4.1 because higher versions are not compatible with the shipped zluda of this webui and have to be replaced manually to make it work. Then relaunch the webui-user.bat

It worked. Thank you. Now the main annoyance is "Compilation is in progress. Pleas wait" each launch. Whatever it's doing, I don't know. I thought that it should do it once. Additionally, I didn't found ZLUDA being faster than DirectML. In fact, in my usecases, generation is ~10% slower. Oh well. At elast I can use upscalers now.

pptp78ec avatar Jun 21 '25 18:06 pptp78ec

Hey np, Compilation can take a while but shouldnt happen again after a relaunch. But Zluda shouldnt be slower than DirectML. It should be 2-4 times faster while using less vram at the same time. Can you share a full cmd log and your txt2img settings where its "slow" ? Maybe its slow due the fact that the gfx files for gfx1201 are not optimized as the others. But i doubt that it would be slower than directml.

If you like to experiment a bit then you can try the new Guide for AMDs TheRock Project: https://github.com/CS1o/Stable-Diffusion-Info/wiki/Webui-Installation-Guides#any-webui-with-therock-by-amd It doesnt need Zluda so it should perform native speed for your GPU. Changing Python Version to 3.11.9 64bit requires you to delete the venv folder of the other webuis and let them rebuild to make them work again.

CS1o avatar Jun 21 '25 20:06 CS1o

Can you share a full cmd log and your txt2img settings where its "slow" ? Maybe its slow due the fact that the gfx files for gfx1201 are not optimized as the others. But i doubt that it would be slower than directml

Certainly: Here's positive prompt:

masterpiece,best quality,very aesthetic,absurdres,highers,high definition,(amazing quality),ultra detailed,very awa,highres,newest,year 2024,year 2023, <lora:OilPainting1llust:0.4>,oilpainting,oiloncanvas,canvastexture,visible textured brushstrokes,<lora:Brushwork1llust:0.5>, Brushwork,LayeredTextures,Loose and expressive brushstrokes,Bold and rough brushstrokes,<lora:KonyaKarasue:0.2>, masterpiece,best quality,artistic composition,aesthetic design,-,1girl,solo,ruby rose (rwby),short hair,black hair,red highlights,red cloak,determined expression,action pose,dynamic angle,mechanical background,weapon_over_shoulder,

Negative prompt:

worst quality,old,early,low quality,lowres,signature,username,logo,(bad hands:1.7),(mutated hands:1.6),mammal,anthro,furry,ambiguous form,feral,semi-anthro,poorly drawn face,disfigured,ugly,(missing fingers:1.4),(malformed hands:1.5),(poorly drawn hands:1.4),(too many fingers:1.6),fused fingers,NSFW,

Here's cmd log for ZLUDA:

venv "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu-zluda\venv\Scripts\Python.exe" WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next. Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] Version: v1.10.1-amd-37-g721f6391 Commit hash: 721f6391993ac63fd246603735e2eb2e719ffac0 ROCm: agents=['gfx1201'] ROCm: version=6.2, using agent gfx1201 ZLUDA support: experimental ZLUDA load: path='D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu-zluda\.zluda' nightly=False Skipping onnxruntime installation. You are up to date with the most recent release. D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu-zluda\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning) no module 'xformers'. Processing without... no module 'xformers'. Processing without... No module 'xformers'. Proceeding without it. D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu-zluda\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_onlyhas been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it frompytorch_lightning.utilities` instead. rank_zero_deprecation( Launching Web UI with arguments: --use-zluda --update-check --skip-ort --medvram --listen --port=7860 --api --cors-allow-origins '*' Loading weights [9ece21c352] from D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu-zluda\models\Stable-diffusion\hassakuXLIllustrious_v22.safetensors Creating model from config: D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu-zluda\repositories\generative-models\configs\inference\sd_xl_base.yaml Running on local URL: http://0.0.0.0:7860 Applying attention optimization: Doggettx... done. Model loaded in 3.2s (create model: 0.8s, apply weights to model: 1.4s, calculate empty prompt: 0.9s). Compilation is in progress. Please wait...

To create a public link, set share=True in launch(). Startup time: 11.5s (prepare environment: 8.8s, initialize shared: 0.5s, other imports: 0.2s, load scripts: 0.3s, initialize extra networks: 0.3s, create ui: 0.3s, gradio launch: 4.1s, add APIs: 0.3s). Reusing loaded model hassakuXLIllustrious_v22.safetensors [9ece21c352] to load plantMilkModelSuite_hempII.safetensors [f0a345ef69] Loading weights [f0a345ef69] from D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu-zluda\models\Stable-diffusion\plantMilkModelSuite_hempII.safetensors Applying attention optimization: Doggettx... done. Weights loaded in 13.2s (send model to cpu: 0.3s, load weights from disk: 0.3s, apply weights to model: 12.6s). Couldn't find VAE named sdxl_vae.safetensors; using None instead 0%| | 0/30 [00:00<?, ?it/s]Compilation is in progress. Please wait... 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [02:01<00:00, 4.04s/it] Total progress: 100%|██████████████████████████████████████████████████████████████████| 30/30 [00:52<00:00, 1.76s/it] Total progress: 100%|██████████████████████████████████████████████████████████████████| 30/30 [00:52<00:00, 1.71s/it]`

DirectML CMD log:

venv "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe" Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] Version: v1.10.1-amd-37-g721f6391 Commit hash: 721f6391993ac63fd246603735e2eb2e719ffac0 D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning) no module 'xformers'. Processing without... no module 'xformers'. Processing without... No module 'xformers'. Proceeding without it. D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_onlyhas been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it frompytorch_lightning.utilities` instead. rank_zero_deprecation( Launching Web UI with arguments: --use-directml --medvram --opt-sub-quad-attention --opt-split-attention --no-half-vae --upcast-sampling --listen --port=7860 --api --cors-allow-origins '*' ONNX failed to initialize: module 'optimum.onnxruntime.modeling_diffusion' has no attribute 'ORTPipelinePart' Loading weights [7eb674963a] from D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\models\Stable-diffusion\hassakuSD15_v13.safetensors Creating model from config: D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\configs\v1-inference.yaml Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch(). Startup time: 25.9s (prepare environment: 37.8s, initialize shared: 1.4s, list SD models: 0.1s, load scripts: 0.4s, create ui: 0.6s, gradio launch: 4.2s, add APIs: 0.1s). Couldn't find VAE named sdxl_vae.safetensors; using None instead Applying attention optimization: Doggettx... done. Model loaded in 12.8s (load weights from disk: 0.5s, load config: 0.2s, create model: 0.5s, apply weights to model: 11.0s, calculate empty prompt: 0.5s). Reusing loaded model hassakuSD15_v13.safetensors [7eb674963a] to load plantMilkModelSuite_hempII.safetensors [f0a345ef69] Loading weights [f0a345ef69] from D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\models\Stable-diffusion\plantMilkModelSuite_hempII.safetensors Creating model from config: D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\repositories\generative-models\configs\inference\sd_xl_base.yaml Couldn't find VAE named sdxl_vae.safetensors; using None instead Applying attention optimization: Doggettx... done. Model loaded in 30.5s (create model: 0.8s, apply weights to model: 29.0s, calculate empty prompt: 0.5s). D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu\modules\safe.py:156: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. return unsafe_torch_load(filename, *args, **kwargs) 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:50<00:00, 1.67s/it] Total progress: 100%|██████████████████████████████████████████████████████████████████| 30/30 [00:45<00:00, 1.53s/it] 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:43<00:00, 1.43s/it] Total progress: 100%|██████████████████████████████████████████████████████████████████| 30/30 [00:43<00:00, 1.46s/it] T`

Strangely, DirectML is slower than it was, maybe the issue is 25.4.1 driver, at 25.6.1 i've got ~1.9-2.1 it/s.

Still, ZLUDA by no means 2x faster than DirectML, the difference is ~20% or so.

I'll try playing with TheRock project.

pptp78ec avatar Jun 22 '25 07:06 pptp78ec

The prompt doesn't matter in that case. The important settings would be Resolution, steps, sampler.

Also dont use --medvram with zluda on a 16gb GPU. That slows down everything.

Also important is to not run anything in the Background. For example Wallpaper Engine causes massive issues when generating at the same time.

So try again without --medvram because you should get at least in the it/s range and not s/it. Maybe try a resolution of 768x1024 or 832x1216 to test. With Euler a as sampler and 30 steps.

And yea try TheRock and let me know the speed you get there! I think it will be the fastest.

CS1o avatar Jun 22 '25 08:06 CS1o

Tried ROCK. It's noticably faster:

venv "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Scripts\Python.exe" Python 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] Version: v1.10.1 Commit hash: 82a973c04367123ae98bd9abdf80d9eda9b910e2 Launching Web UI with arguments: --skip-python-version-check --opt-sdp-attention W0622 11:56:51.773000 10372 venv\Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs. D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Lib\site-packages\timm\models\layers_init_.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning) no module 'xformers'. Processing without... no module 'xformers'. Processing without... No module 'xformers'. Proceeding without it. Loading weights [f0a345ef69] from D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\models\Stable-diffusion\plantMilkModelSuite_hempII.safetensors Creating model from config: D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\repositories\generative-models\configs\inference\sd_xl_base.yaml D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Lib\site-packages\huggingface_hub\file_download.py:943: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Startup time: 7.3s (prepare environment: 1.4s, import torch: 2.9s, import gradio: 0.9s, setup paths: 0.6s, initialize shared: 0.3s, other imports: 0.2s, load scripts: 0.4s, create ui: 0.4s). Applying attention optimization: sdp... done. Model loaded in 5.5s (create model: 0.6s, apply weights to model: 4.1s, load textual inversion embeddings: 0.1s, calculate empty prompt: 0.4s). 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:21<00:00, 1.43it/s] Total progress: 100%|██████████████████████████████████████████████████████████████████| 30/30 [00:38<00:00, 1.29s/it] 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:18<00:00, 1.61it/s] Total progress: 100%|██████████████████████████████████████████████████████████████████| 30/30 [00:23<00:00, 1.30it/s]

ESRGAN upscaling throws errors, though. Back to ZLUDA w/o ---medvram:

venv "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu-zluda\venv\Scripts\Python.exe" WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next. Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] Version: v1.10.1-amd-37-g721f6391 Commit hash: 721f6391993ac63fd246603735e2eb2e719ffac0 ROCm: agents=['gfx1201'] ROCm: version=6.2, using agent gfx1201 ZLUDA support: experimental ZLUDA load: path='D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu-zluda.zluda' nightly=False Skipping onnxruntime installation. You are up to date with the most recent release. D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu-zluda\venv\lib\site-packages\timm\models\layers_init_.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning) no module 'xformers'. Processing without... no module 'xformers'. Processing without... No module 'xformers'. Proceeding without it. D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu-zluda\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities instead. rank_zero_deprecation( Launching Web UI with arguments: --use-zluda --update-check --skip-ort --listen --port=7860 --api --cors-allow-origins '*' Loading weights [f0a345ef69] from D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu-zluda\models\Stable-diffusion\plantMilkModelSuite_hempII.safetensors Creating model from config: D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui-amdgpu-zluda\repositories\generative-models\configs\inference\sd_xl_base.yaml Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch(). Startup time: 31.1s (prepare environment: 40.0s, initialize shared: 1.2s, other imports: 0.5s, load scripts: 0.4s, create ui: 1.0s, gradio launch: 4.3s, add APIs: 0.3s). Applying attention optimization: Doggettx... done. Model loaded in 31.3s (load weights from disk: 1.2s, create model: 0.6s, apply weights to model: 28.1s, apply half(): 0.1s, move model to device: 0.1s, load textual inversion embeddings: 0.3s, calculate empty prompt: 0.9s). Couldn't find VAE named sdxl_vae.safetensors; using None instead 0%| | 0/30 [00:00<?, ?it/s]Compilation is in progress. Please wait... 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:55<00:00, 1.85s/it] Total progress: 100%|██████████████████████████████████████████████████████████████████| 30/30 [00:54<00:00, 1.82s/it]

No speed improvements, unfortunately.

pptp78ec avatar Jun 22 '25 09:06 pptp78ec

Thanks for testing. You can remove --opt-sdp-attention so that esrgan based upscaling works again but the speed is slower. The best performance/usability I got with TheRock was on ReForge.

Did you used hires fix for the tests too? That also slows down when used with incorrect settings. For normal speed tests dont enable it. And if enabled always set hires steps to 10.

Also do you have the 9070xt or non xt?

CS1o avatar Jun 22 '25 09:06 CS1o

You can remove --opt-sdp-attention so that esrgan based upscaling works again but the speed is slower. The best performance/usability I got with TheRock was on ReForge.

No difference, it's still throws same miopen exception

venv "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Scripts\Python.exe" Python 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] Version: v1.10.1 Commit hash: 82a973c04367123ae98bd9abdf80d9eda9b910e2 Launching Web UI with arguments: --skip-python-version-check W0622 12:25:51.589000 19576 venv\Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs. D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Lib\site-packages\timm\models\layers_init_.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning) no module 'xformers'. Processing without... no module 'xformers'. Processing without... No module 'xformers'. Proceeding without it. Loading weights [f0a345ef69] from D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\models\Stable-diffusion\plantMilkModelSuite_hempII.safetensors Creating model from config: D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\repositories\generative-models\configs\inference\sd_xl_base.yaml D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Lib\site-packages\huggingface_hub\file_download.py:943: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Startup time: 7.4s (prepare environment: 1.4s, import torch: 3.0s, import gradio: 0.9s, setup paths: 0.6s, initialize shared: 0.3s, other imports: 0.2s, load scripts: 0.4s, create ui: 0.2s, gradio launch: 0.4s). Applying attention optimization: Doggettx... done. Model loaded in 5.4s (load weights from disk: 0.4s, create model: 0.3s, apply weights to model: 4.0s, load textual inversion embeddings: 0.1s, calculate empty prompt: 0.4s). 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:23<00:00, 1.30it/s] tiled upscale: 100%|███████████████████████████████████████████████████████████████████| 35/35 [00:03<00:00, 11.21it/s] MIOpen Error: D:/jam/TheRock/ml-libs/MIOpen/src/ocl/convolutionocl.cpp:275: No suitable algorithm was found to execute the required convolution *** Error completing request *** Arguments: ('task(b4sbshgida679fz)', <gradio.routes.Request object at 0x000002A7E7554D90>, 'masterpiece,best quality,very aesthetic,absurdres,highers,high definition,(amazing quality),ultra detailed,very awa,highres,newest,year 2024,year 2023,\nlora:OilPainting1llust:0.4,oilpainting,oiloncanvas,canvastexture,visible textured brushstrokes,lora:Brushwork1llust:0.5,\nBrushwork,LayeredTextures,Loose and expressive brushstrokes,Bold and rough brushstrokes,lora:KonyaKarasue:0.2,\nmasterpiece,best quality,artistic composition,aesthetic design,-,1girl,solo,ruby rose (rwby),short hair,black hair,red highlights,red cloak,determined expression,action pose,dynamic angle,mechanical background,weapon_over_shoulder,', 'worst quality,old,early,low quality,lowres,signature,username,logo,(bad hands:1.7),(mutated hands:1.6),mammal,anthro,furry,ambiguous form,feral,semi-anthro,poorly drawn face,disfigured,ugly,(missing fingers:1.4),(malformed hands:1.5),(poorly drawn hands:1.4),(too many fingers:1.6),fused fingers,NSFW,', [], 1, 1, 5, 1249, 832, True, 0.7, 2, 'R-ESRGAN 4x+ Anime6B', 10, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', ['Clip skip: 2'], 0, 30, 'Euler a', 'Automatic', False, '', 0.8, 2882728019, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {} Traceback (most recent call last): File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\modules\call_queue.py", line 74, in f res = list(func(*args, **kwargs)) ^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\modules\call_queue.py", line 53, in f res = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\modules\call_queue.py", line 37, in f res = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\modules\txt2img.py", line 109, in txt2img processed = processing.process_images(p) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\modules\processing.py", line 847, in process_images res = process_images_inner(p) ^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\modules\processing.py", line 988, in process_images_inner samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\modules\processing.py", line 1362, in sample return self.sample_hr_pass(samples, decoded_samples, seeds, subseeds, subseed_strength, prompts) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\modules\processing.py", line 1421, in sample_hr_pass samples = images_tensor_to_samples(decoded_samples, approximation_indexes.get(opts.sd_vae_encode_method)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\modules\sd_samplers_common.py", line 110, in images_tensor_to_samples x_latent = model.get_first_stage_encoding(model.encode_first_stage(image)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\repositories\generative-models\sgm\models\diffusion.py", line 127, in encode_first_stage z = self.first_stage_model.encode(x) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\repositories\generative-models\sgm\models\autoencoder.py", line 321, in encode return super().encode(x).sample() ^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\repositories\generative-models\sgm\models\autoencoder.py", line 308, in encode h = self.encoder(x) ^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\model.py", line 579, in forward h = self.down[i_level].block[i_block](hs[-1], temb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\model.py", line 132, in forward h = self.conv1(h) ^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 599, in network_Conv2d_forward return originals.Conv2d_forward(self, input) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Lib\site-packages\torch\nn\modules\conv.py", line 554, in forward return self._conv_forward(input, self.weight, self.bias) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\VS_projects\StableDiffusion\sdwui2\stable-diffusion-webui\venv\Lib\site-packages\torch\nn\modules\conv.py", line 549, in _conv_forward return F.conv2d( ^^^^^^^^^ RuntimeError: miopenStatusUnknownError


Did you used hires fix for the tests too? That also slows down when used with incorrect settings. For normal speed tests dont enable it. And if enabled always set hires steps to 10.

No, speed comparison was done w/o upscaling.

Also do you have the 9070xt or non xt?

9070 non-xt

pptp78ec avatar Jun 22 '25 09:06 pptp78ec

Okay thats strange. But as TheRock is still very early build it could be just some bug. For me it worked and ReForge worked normaly with upscaling too.

Okay I guess for the non-xt variant + unoptimized gfx files that could be normal zluda spees then. But somebody with the same GPU should post their resultst here to compare.

CS1o avatar Jun 22 '25 09:06 CS1o

Tried reforge/forge - have constant issues with AMD driver error restart at the end of generation.

pptp78ec avatar Jun 22 '25 11:06 pptp78ec