[Bug]: --USE-ZLUDA uses cpu ..[ Extremly slow performance].... while '--use-directml ' works but i think didnt uses zluda [litlle better Performance] not more then 2 its for lightest model ....

Open Geekyboi6117 opened this issue 9 months ago • 0 comments

Checklist

[x] The issue exists after disabling all extensions
[x] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[x] The issue exists in the current version of the webui
[ ] The issue has not been reported before recently
[x] The issue has been reported before but has not been fixed yet

What happened?

uses cpu when said to use zluda very slow speed. directml works but not fast . saw videos in which zluda gives more then 5 its for rx6800 . while my GPU is rx6600 . i think it should give atleast more then 3 its on lightest model with 512 512 res ... also tried comfyui-zluda fork . same performance . maybe somthing wrong with rocm and zluda version versions . i there is a catch . when using comnfyui-zluda fork . it detects zluda here is the log from it .

----------------------ZLUDA----------------------------- :: ZLUDA detected, disabling non-supported functions. :: CuDNN, flash_sdp, mem_efficient_sdp disabled). -------------------------------------------------------- :: Device : AMD Radeon RX 6600 [ZLUDA]

Total VRAM 8176 MB, total RAM 16306 MB pytorch version: 2.3.0+cu118 Set vram state to: NORMAL_VRAM Device: cuda:0 AMD Radeon RX 6600 [ZLUDA] : native Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention

but the gen speed isnt fast like not more then 2 here also

Steps to reproduce the problem

run user webui
genretes console log
opens
lazy gen speed
uses cpu when said to use zluda

What should have happened?

should use zluda . find rocm runtime instead of rocm home

What browsers do you use to access the UI ?

No response

Sysinfo

sysinfo-2025-03-08-15-14.json

Console logs

(venv) E:\AII\sd_AMD\stable-diffusion-webui-amdgpu>webui-user.bat
venv "E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.10.1-amd-24-g63895a83
Commit hash: 63895a83f70651865cc9653583c69765009489f3
ROCm: agents=['gfx1032']
ROCm: version=5.7, using agent gfx1032
ZLUDA support: experimental
Using ZLUDA in E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\.zluda
No ROCm runtime is found, using ROCM_HOME='C:\Program Files\AMD\ROCm\5.7'
E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments: --use-zluda --disable-nan-check --opt-sdp-attention --medvram --no-half-vae --opt-split-attention --ckpt-dir 'E:\AII\Models' --precision full --no-half
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
ONNX failed to initialize: Failed to import diffusers.pipelines.pipeline_utils because of the following error (look up to see its traceback):
Failed to import diffusers.models.autoencoders.autoencoder_kl because of the following error (look up to see its traceback):
Failed to import diffusers.loaders.unet because of the following error (look up to see its traceback):
cannot import name 'Cache' from 'transformers' (E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\__init__.py)
Loading weights [6ce0161689] from E:\AII\Models\v1-5-pruned-emaonly.safetensors
Creating model from config: E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\configs\v1-inference.yaml
E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 11.3s (prepare environment: 14.5s, initialize shared: 0.7s, load scripts: 0.4s, create ui: 0.4s, gradio launch: 0.7s).
Applying attention optimization: Doggettx... done.
Model loaded in 2.3s (load weights from disk: 0.3s, create model: 0.6s, apply weights to model: 1.1s, hijack: 0.1s, calculate empty prompt: 0.1s).

txt2img: CAT
E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\modules\safe.py:156: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return unsafe_torch_load(filename, *args, **kwargs)
 25%|███████████████████████████████████████████▌                                                                                                                                  | 5/20 [00:36<01:50,  7.35s/it]Interrupted with signal 2 in <frame at 0x000001AAC8819F70, file 'C:\\Users\\ABDULLAH\\AppData\\Local\\Programs\\Python\\Python310\\lib\\threading.py', line 324, code wait>         | 5/20 [00:29<01:37,  6.48s/it]
Terminate batch job (Y/N)? Y

Additional information

as console log says . slow speed when using cpu . and when i --use-directml speed gets to 2its or less . but genereally better then cpu

Mar 08 '25 15:03 Geekyboi6117