stable-diffusion-webui-forge icon indicating copy to clipboard operation
stable-diffusion-webui-forge copied to clipboard

[Feature Request]: ZLUDA Support?

Open RandomLegend opened this issue 11 months ago • 24 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Heyho,

currently i use a RTX3070 and i just ordered a RX 7900 XT. I know RocM is a thing but afaik it's not nearly as performant as CUDA?

So i found out about ZLUDA and that people got it working on A1111.

Did anyone try this on forge? I mean technically it should work just the same way as it does on A1111 right?

Proposed workflow

Not applicable

Additional information

No response

RandomLegend avatar Feb 28 '24 22:02 RandomLegend

i was about to ask about the same thing, hope that @lllyasviel will look into it

yacinesh avatar Feb 29 '24 19:02 yacinesh

I'm playing with ZLUDA today, and will update my comment if/when I learn other relevant details.

Running Win11 x64 + 7900XTX w/ Radeon "Game" driver. (*1)

currently i use a RTX3070 and i just ordered a RX 7900 XT. I know ROCm is a thing but afaik it's not nearly as performant as CUDA?

Pretty much, from the benchmarks I've collected/seen/observed the performance hierarchy is roughly...

  1. Linux-CUDA
  2. Linux-ROCm ~=/ Win-Cuda (situational, so I'm calling it a tie)
  3. Win-Zluda
  4. Linux-OpenML
  5. Win-OpenML
  6. CPU

So i found out about ZLUDA and that people got it working on A1111. Yep. Can confirm a wild speedup in A1111 on Win from OpenML to Zluda - ballpark is around 30x faster (3,000% !). There's more variables that I care to isolate, but as a quick check over 30runs w/ batch size 4 & SD 1.5 the 7900xtx went from an avg of 2s/it to 15it/s.

Besides performance, ROCm (and maybe OpenML?) either doesn't do inpainting, or does it horribly. I watched so very many hours of inpainting tutorials to discover this last year. Zluda enables reliable inpainting!

Zluda also makes deterministic details the same - so if your working with something generated with nVidia hardware they'll actually look the same (or as "the same" as it can be, given the countless other variables that affect the output).

Did anyone try this on forge? I mean technically it should work just the same way as it does on A1111 right?

Wouldn't know where to start - but I'm happy to try. If we work backwards from the "lyqqshytiger A1111 DirectML" scripts associated with the "--use-zluda" parameter and replace the cublas64_11.dll & cusparse64_11.dll files with the zluda versions should be able to get most of the way to a solution

joshaiken avatar Mar 13 '24 03:03 joshaiken

Could you share what COMMANDLINE_ARGS you have set up for option 3) win-zluda? I changed from --use-directml to --use-zluda, but I get a 'RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check' (I don't get that with directml). Thanks!

mongolsteppe avatar Mar 16 '24 15:03 mongolsteppe

https://wikiwiki.jp/sd_toshiaki/%E3%82%B3%E3%83%A1%E3%83%B3%E3%83%88/Nvidia%E4%BB%A5%E5%A4%96%E3%81%AE%E3%82%B0%E3%83%A9%E3%83%9C%E3%81%AB%E9%96%A2%E3%81%97%E3%81%A6

I found this statement on this page helpful.

"I managed to run Forge with ZLUDA v3.5+7900XTX as follows: I used AnimagineXLV3 with Batch100, and it executed without any errors until the end, so I think it's relatively stable. I'll skip the details about setting the paths and environment variables since they're the same as SD.NEXT.

I ran webui.bat from Forge to start it, but it immediately shut down after starting. I reinstalled torch and torchvision: .\venv\Scripts\activate pip uninstall torch torchvision -y pip install torch==2.2.0 torchvision --index-url https://download.pytorch.org/whl/cu118

Then, I replaced cublas64_11.dll, cusparse64_11.dll, and nvrtc64_112_0.dll in venv\Lib\site-packages\torch\lib with the ones from ZLUDA.

In modules\initialize.py, under import torch, I added the following lines: torch.backends.cudnn.enabled = False torch.backends.cuda.enable_flash_sdp(False) torch.backends.cuda.enable_math_sdp(True) torch.backends.cuda.enable_mem_efficient_sdp(False)

That's how I did it."

Bocchi-Chan2023 avatar Mar 20 '24 03:03 Bocchi-Chan2023

@Bocchi-Chan2023 I tried your instructions but I got this error image

yacinesh avatar Mar 23 '24 13:03 yacinesh

@Bocchi-Chan2023 I tried your instructions but I got this error image

.\venv\Scripts\activate pip uninstall torch torchvision -y pip install torch==2.2.0 torchvision --index-url https://download.pytorch.org/whl/cu118

Bocchi-Chan2023 avatar Mar 23 '24 13:03 Bocchi-Chan2023

@Bocchi-Chan2023 yes i already done it . and the "cublas64_11.dll, cusparse64_11.dll, and nvrtc64_112_0.dll" files i copied them from sd next folder is that normal ?

yacinesh avatar Mar 23 '24 14:03 yacinesh

All these guides are for windows.

PATHS and libraries work a little bit different in Linux and i'd love to see someone making ZLUDA + Forge work on Linux.

RandomLegend avatar Mar 23 '24 14:03 RandomLegend

@Bocchi-Chan2023 yes i already done it . and the "cublas64_11.dll, cusparse64_11.dll, and nvrtc64_112_0.dll" files i copied them from sd next folder is that normal ?

I think it's still possible, but my recommendation would be to rename and deploy binaries downloaded from the latest zluda release :)

Bocchi-Chan2023 avatar Mar 23 '24 15:03 Bocchi-Chan2023

@Bocchi-Chan2023 yes i already done it . and the "cublas64_11.dll, cusparse64_11.dll, and nvrtc64_112_0.dll" files i copied them from sd next folder is that normal ?

I think it's still possible, but my recommendation would be to rename and deploy binaries downloaded from the latest zluda release :)

am i correct here ? image

yacinesh avatar Mar 23 '24 15:03 yacinesh

Maybe @lshqqytiger could help out?

Zaakh avatar Mar 27 '24 08:03 Zaakh

@Bocchi-Chan2023 yes i already done it . and the "cublas64_11.dll, cusparse64_11.dll, and nvrtc64_112_0.dll" files i copied them from sd next folder is that normal ?

I think it's still possible, but my recommendation would be to rename and deploy binaries downloaded from the latest zluda release :)

am i correct here ? image

yes

Bocchi-Chan2023 avatar Mar 27 '24 10:03 Bocchi-Chan2023

Just to ask all of you, did you all get it working? because I did but it needed a couple more steps when installing and to get running - I don't want to fill this thread unless it's needed.

Grey3016 avatar Apr 05 '24 00:04 Grey3016

@Grey3016 i did not, but again i am on Linux and the guides i found where for Windows.

I am not unsatisfied with the ROCm performance but i have no idea on what gains i am possibly missing out with ZLUDA.

RandomLegend avatar Apr 05 '24 05:04 RandomLegend

@Grey3016 i did not, but again i am on Linux and the guides i found where for Windows.

I am not unsatisfied with the ROCm performance but i have no idea on what gains i am possibly missing out with ZLUDA.

You aren't. The only reason we're using ZLUDA in Windows is because we don't have ROCm in Windows... yet.

brknsoul avatar Apr 18 '24 21:04 brknsoul

Just to ask all of you, did you all get it working? because I did but it needed a couple more steps when installing and to get running - I don't want to fill this thread unless it's needed.

Would you be able to provide the extra steps you had to take? Thanks.

beosliege avatar Apr 29 '24 12:04 beosliege

ZLUDA fork: https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge launch with --zluda (optional) requirements Visual C++ Runtime ROCm 5.7

lshqqytiger avatar Apr 29 '24 12:04 lshqqytiger

ZLUDA fork: https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge launch with --zluda (optional) requirements Visual C++ Runtime ROCm 5.7

i'm already trying to use your forked forge but i'm getting alot of errors, where i can report for issues ?

yacinesh avatar Apr 29 '24 12:04 yacinesh

I enabled issue feature

lshqqytiger avatar Apr 29 '24 13:04 lshqqytiger

I opened issue feature

i've managed to opened it finally but it failed to install insightface automatically, should i install it manually or leave it ?

yacinesh avatar Apr 29 '24 13:04 yacinesh

ignore if there isn't any issue (e.g. module not found)

lshqqytiger avatar Apr 29 '24 13:04 lshqqytiger

ZLUDA fork: https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge launch with --zluda (optional) requirements Visual C++ Runtime ROCm 5.7

I could not start it in my environment. runtime and rocm are already installed. These are the errors I got:

Failed to install ZLUDA: 'Namespace' object has no attribute 'use_zluda_dnn'

RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

File "C:\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\cuda_init_.py", line 284, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

Bocchi-Chan2023 avatar May 01 '24 07:05 Bocchi-Chan2023

Could you share what COMMANDLINE_ARGS you have set up for option 3) win-zluda? I changed from --use-directml to --use-zluda, but I get a 'RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check' (I don't get that with directml). Thanks!

./webui.bat --use-zluda --listen --no-half-vae

joshaiken avatar May 01 '24 07:05 joshaiken

ZLUDA fork: https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge launch with --zluda (optional) requirements Visual C++ Runtime ROCm 5.7

I could not start it in my environment. runtime and rocm are already installed. These are the errors I got:

Failed to install ZLUDA: 'Namespace' object has no attribute 'use_zluda_dnn'

RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

File "C:\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\cuda_init_.py", line 284, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

Will fix

lshqqytiger avatar May 01 '24 08:05 lshqqytiger