stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Feature Request]: ZLUDA support
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What would your feature do ?
SD.NET has preliminarily supported the use of ZLUDA on amd gpu, and the efficiency is very high, which is three times that of rocm. Can the method of adapting SD.NET be used for preliminary adaptation now?
Proposed workflow
https://github.com/lshqqytiger/ZLUDA https://github.com/vladmandic/automatic/wiki/ZLUDA https://github.com/vladmandic/automatic/commit/cc4438651f69973e055b63071bf1d0e3cc558eea https://github.com/vosen/ZLUDA
Additional information
No response
This would be great as sdnext doesnt support using controlnet in txt2image when using zluda and diffusers backend
This would be nice. I was able to make it somewhat work with SD.Next, but I really don't like that GUI, I just can't use it effectively. I don't need it here desperately, but I would really love to be able to make comparisons between AMD and Nvidia GPUs using the exact same workflow in usable ui.
Edit: also some features i need for my workflow didn't work in SD next for me. At least for now, the img2img didn't work.
Also, I would appreciate not to have to use multiple forks
It's kinda easy you can just snip in
torch.backends.cudnn.enabled = False
torch.backends.cuda.enable_flash_sdp(False)
torch.backends.cuda.enable_math_sdp(True)
torch.backends.cuda.enable_mem_efficient_sdp(False)
at somewhere around https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/initialize.py#L57
It's kinda easy you can just snip in
torch.backends.cudnn.enabled = False torch.backends.cuda.enable_flash_sdp(False) torch.backends.cuda.enable_math_sdp(True) torch.backends.cuda.enable_mem_efficient_sdp(False)
at somewhere around https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/initialize.py#L57
You are right, fallowing ZLUDA installation guide by SD.Next and using your solution for disabling cudnn makes it work. It is really rather simple to make it work with main a1111. Much simpler than ONNX and Olive anywhere, LOL. Thanks for you answer. Now I need to only figure out why it randomly causes only around 50-60% GPU load. (it went like 4it/s for around half of the generating and jumped to 9it/s for the rest etc. but it looked like it was caused by Windows or driver, so probably not related to SD, will investigate further )
It's kinda easy you can just snip in
torch.backends.cudnn.enabled = False torch.backends.cuda.enable_flash_sdp(False) torch.backends.cuda.enable_math_sdp(True) torch.backends.cuda.enable_mem_efficient_sdp(False)
at somewhere around https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/initialize.py#L57
You are right, fallowing ZLUDA installation guide by SD.Next and using your solution for disabling cudnn makes it work. It is really rather simple to make it work with main a1111. Much simpler than ONNX and Olive anywhere, LOL. Thanks for you answer. Now I need to only figure out why it randomly causes only around 50-60% GPU load. (it went like 4it/s for around half of the generating and jumped to 9it/s for the rest etc. but it looked like it was caused by Windows or driver, so probably not related to SD, will investigate further )
Can you explain a little how you make it work? i'm trying with a new install of A1111 with no luck Thanks
You are right, fallowing ZLUDA installation guide by SD.Next and using your solution for disabling cudnn makes it work. It is really rather simple to make it work with main a1111. Much simpler than ONNX and Olive anywhere, LOL. Thanks for you answer. Now I need to only figure out why it randomly causes only around 50-60% GPU load. (it went like 4it/s for around half of the generating and jumped to 9it/s for the rest etc. but it looked like it was caused by Windows or driver, so probably not related to SD, will investigate further )
Can you explain a little how you make it work? i'm trying with a new install of A1111 with no luck Thanks
I mostly followed this guide: https://github.com/vladmandic/automatic/wiki/ZLUDA it is for SD.Next, so you can either use SD.next, or skip these parts:
"Install or checkout dev" - I installed main automatic1111 instead (don't forget to start it at least once)
"Install CUDA Torch" - should already be present
"Compilation, Settings, and First Generation" - you first need to disable cudnn (it is not yet supported), by adding those lines from wfjsw to that file mentioned. And after that just start a1111 and generate. (it might take longer to start generating the first time.
I hope I explained it well enough, sorry I'm writing on my phone now and running out of time to write :D
It's kinda easy you can just snip in
torch.backends.cudnn.enabled = False torch.backends.cuda.enable_flash_sdp(False) torch.backends.cuda.enable_math_sdp(True) torch.backends.cuda.enable_mem_efficient_sdp(False)
at somewhere around https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/initialize.py#L57
Im trying to add those lines to initialize.py, but i get the followin error: NameError: name 'torch' is not defined
I've tried to add them to different parts of the file with no success.
You can add it below https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/initialize.py#L14 instead.
You can add it below https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/initialize.py#L14 instead.
Thanks! that made it work! not as fast as SD.next with diffusers backend, but much consistent generation
btw you should use --opt-sdp-attention
if you are not using it
btw you should use
--opt-sdp-attention
if you are not using it
I'ts working faster now, thanks.
It's kinda easy you can just snip in
torch.backends.cudnn.enabled = False torch.backends.cuda.enable_flash_sdp(False) torch.backends.cuda.enable_math_sdp(True) torch.backends.cuda.enable_mem_efficient_sdp(False)
at somewhere around https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/initialize.py#L57
Im trying to add those lines to initialize.py, but i get the followin error: NameError: name 'torch' is not defined
I've tried to add them to different parts of the file with no success.
Hey, I managed to get it working on my RX 7800 XT with SD.next without needing to add any of those lines of code - I just set the enviroment variable set DISABLE_ADDMM_CUDA_LT=1
in the .bat file I use to launch it, as per the recommendations of the dev of ZLUDA, for pytorch compatibility: https://github.com/lshqqytiger/ZLUDA?tab=readme-ov-file#pytorch
Hi and thanks everybody,
You can add it below https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/initialize.py#L14 instead.
I'm stuck here. I can start SD, but everytime I try to generate something I got this error :
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
I'm not sure if it's related or not. SD.Next is working fine with Zluda, but I need the automatic1111 webui for some reason.
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
That means the code is not correctly applied.
my commandline will be set COMMANDLINE_ARGS=--opt-sdp-attention --no-half --use-directml ? the SD is not using GPU only CPU...
my commandline will be set COMMANDLINE_ARGS=--opt-sdp-attention --no-half --use-directml ? the SD is not using GPU only CPU...
Remove the --use-directml
my commandline will be set COMMANDLINE_ARGS=--opt-sdp-attention --no-half --use-directml ? the SD is not using GPU only CPU...
Remove the --use-directml
Ty @celulari Now i'm getting this error.
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
I put the code in line 14 bellow import torch on initialize.py file
torch.backends.cudnn.enabled = False torch.backends.cuda.enable_flash_sdp(False) torch.backends.cuda.enable_math_sdp(True) torch.backends.cuda.enable_mem_efficient_sdp(False)
My Zluda is ok, i can access in cmd, the PATH its ok
my commandline will be set COMMANDLINE_ARGS=--opt-sdp-attention --no-half --use-directml ? the SD is not using GPU only CPU...
Remove the --use-directml
Ty @celulari Now i'm getting this error.
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
I put the code in line 14 bellow import torch on initialize.py file
torch.backends.cudnn.enabled = False torch.backends.cuda.enable_flash_sdp(False) torch.backends.cuda.enable_math_sdp(True) torch.backends.cuda.enable_mem_efficient_sdp(False)
My Zluda is ok, i can access in cmd, the PATH its ok
Have you tried my suggestion from a few comments above? https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/14946#issuecomment-1960602242
my commandline will be set COMMANDLINE_ARGS=--opt-sdp-attention --no-half --use-directml ? the SD is not using GPU only CPU...
Remove the --use-directml
Ty @celulari Now i'm getting this error.
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
I put the code in line 14 bellow import torch on initialize.py file torch.backends.cudnn.enabled = False torch.backends.cuda.enable_flash_sdp(False) torch.backends.cuda.enable_math_sdp(True) torch.backends.cuda.enable_mem_efficient_sdp(False) My Zluda is ok, i can access in cmd, the PATH its ok
Have you tried my suggestion from a few comments above? #14946 (comment)
Yes, same erro. I'm using new instalation
My initialize.py looks like this:
I'm using a RX6800
my commandline will be set COMMANDLINE_ARGS=--opt-sdp-attention --no-half --use-directml ? the SD is not using GPU only CPU...
Remove the --use-directml
Ty @celulari Now i'm getting this error.
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
I put the code in line 14 bellow import torch on initialize.py file
torch.backends.cudnn.enabled = False torch.backends.cuda.enable_flash_sdp(False) torch.backends.cuda.enable_math_sdp(True) torch.backends.cuda.enable_mem_efficient_sdp(False)
My Zluda is ok, i can access in cmd, the PATH its ok
What's the output of:
.\venv\Scripts\activate
python
import torch
ten1 = torch.randn((2, 4,), device="cuda")
ten2 = torch.randn((4, 8,), device="cuda")
torch.mm(ten1, ten2)
I am with SD.Next currently and having same exact problem with CUBLAS (as that is topic that had risen that problem)
Arguments
set DISABLE_ADDMM_CUDA_LT=1
and --use-zluda
are both being passed, and i ensured that HIP device is visible via AMD_LOG_LEVEL=3
(it also fully compiled model before throwing this error in the first place based on zluda.db).
What's the output of:
Error is exactly this
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
my commandline will be set COMMANDLINE_ARGS=--opt-sdp-attention --no-half --use-directml ? the SD is not using GPU only CPU...
Remove the --use-directml
Ty @celulari Now i'm getting this error.
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
I put the code in line 14 bellow import torch on initialize.py file torch.backends.cudnn.enabled = False torch.backends.cuda.enable_flash_sdp(False) torch.backends.cuda.enable_math_sdp(True) torch.backends.cuda.enable_mem_efficient_sdp(False) My Zluda is ok, i can access in cmd, the PATH its ok
What's the output of:
.\venv\Scripts\activate python import torch ten1 = torch.randn((2, 4,), device="cuda") ten2 = torch.randn((4, 8,), device="cuda") torch.mm(ten1, ten2)
Did you correctly replace the cublas dll
Did you correctly replace the cublas dll
I assume not (not that ZLUDA guide mentioned this specific step). I hadn't replaced anything yet, tbh. I wanted to, but couldn't find WHERE specifically should it be placed/replaced. It should be in pytorch directory, isn't it? Where specifically is that pytorch runtime? In .../venv/Lib/site-packages i only see pytorch_lightning and pytorch_lightning-1.9.4.dist-info
...
Oh, i see... It is literally torch, not pytorch. There are two libraries, cublasLt64_11.dll (531.3 MB) and cublas64_11.dll (86.6 MB)... I don't think that these are ones i need to replace, aren't they? ZLUDA cublass.dll is 166 KB in size only.
Did you correctly replace the cublas dll
yes but i'm getting this error:
OSError: [WinError 126] The specified module could not be found. Error loading "D:\stable-diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies.
Did you correctly replace the cublas dll
I assume not (not that ZLUDA guide mentioned this specific step). I hadn't replaced anything yet, tbh. I wanted to, but couldn't find WHERE specifically should it be placed/replaced. It should be in pytorch directory, isn't it? Where specifically is that pytorch runtime? In .../venv/Lib/site-packages i only see pytorch_lightning and pytorch_lightning-1.9.4.dist-info
...
Oh, i see... It is literally torch, not pytorch. There are two libraries, cublasLt64_11.dll (531.3 MB) and cublas64_11.dll (86.6 MB)... I don't think that these are ones i need to replace, aren't they? ZLUDA cublass.dll is 166 KB in size only.
The latest version of the guide doesn't give the explicit steps because the SD.Next installer will do it for you with --use-zluda
.
These are the steps:
- Ensure you are using
torch: 2.2.0+cu118
:
.\venv\Scripts\activate
pip uninstall torch torchvision torch-directml -y
pip install torch==2.2.0 torchvision --index-url https://download.pytorch.org/whl/cu118
- Rename these 3 .dll files from the ZLUDA .zip archive:
- cublas.dll -> cublas64_11.dll
- cusparse.dll -> cusparse64_11.dll
- nvrtc.dll -> nvrtc64_112_0.dll
- Replace all of these in the aformentioned
sd_install_folder\venv\lib\site-packages\torch\lib
folder, and you should be good to go.
Did you correctly replace the cublas dll
I assume not (not that ZLUDA guide mentioned this specific step). I hadn't replaced anything yet, tbh. I wanted to, but couldn't find WHERE specifically should it be placed/replaced. It should be in pytorch directory, isn't it? Where specifically is that pytorch runtime? In .../venv/Lib/site-packages i only see pytorch_lightning and pytorch_lightning-1.9.4.dist-info ... Oh, i see... It is literally torch, not pytorch. There are two libraries, cublasLt64_11.dll (531.3 MB) and cublas64_11.dll (86.6 MB)... I don't think that these are ones i need to replace, aren't they? ZLUDA cublass.dll is 166 KB in size only.
The latest version of the guide doesn't give the explicit steps because the SD.Next installer will do it for you with
--use-zluda
.These are the steps:
- Ensure you are using
torch: 2.2.0+cu118
:.\venv\Scripts\activate pip uninstall torch torchvision torch-directml -y pip install torch==2.2.0 torchvision --index-url https://download.pytorch.org/whl/cu118
- Rename these 3 .dll files from the ZLUDA .zip archive:
- cublas.dll -> cublas64_11.dll
- cusparse.dll -> cusparse64_11.dll
- nvrtc.dll -> nvrtc64_112_0.dll
- Replace all of these in the aformentioned
sd_install_folder\venv\lib\site-packages\torch\lib
folder, and you should be good to go.
Doing exacty these steps...
OSError: [WinError 126] The specified module could not be found. Error loading "D:\stable-diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\lib\caffe2_nvrtc.dll" or one of its dependencies.
Doing exacty these steps...
OSError: [WinError 126] The specified module could not be found. Error loading "D:\stable-diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\lib\caffe2_nvrtc.dll" or one of its dependencies.
Are you using this version of ZLUDA? https://github.com/lshqqytiger/ZLUDA
Also, if you are and it still isn't working, try running python -m torch.utils.collect_env
after activating the venv as per my previous comment and paste the output here.
The latest version of the guide doesn't give the explicit steps because the SD.Next installer will do it for you with
--use-zluda
Apparently not. Unless it does symlink though /.zluda folder I already do that and tried --reinstall flag as well.
Are you using this version of ZLUDA? https://github.com/lshqqytiger/ZLUDA
Yes
Also, if you are and it still isn't working, try running
python -m torch.utils.collect_env
after activating the venv as per my previous comment and paste the output here.
Collecting environment information...
PyTorch version: 2.2.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Майкрософт Windows 11 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture=9
CurrentClockSpeed=3401
DeviceID=CPU0
Family=107
L2CacheSize=4096
L2CacheSpeed=
Manufacturer=AuthenticAMD
MaxClockSpeed=3401
Name=AMD Ryzen 7 5800X3D 8-Core Processor
ProcessorType=3
Revision=8450
Versions of relevant libraries:
[pip3] dctorch==0.1.2
[pip3] numpy==1.26.4
[pip3] onnx==1.15.0
[pip3] onnxruntime==1.17.1
[pip3] onnxruntime-directml==1.17.1
[pip3] open-clip-torch==2.24.0
[pip3] pytorch-lightning==1.9.4
[pip3] torch==2.2.0+cu118
[pip3] torchdiffeq==0.2.3
[pip3] torchmetrics==1.3.1
[pip3] torchsde==0.2.6
[pip3] torchvision==0.17.0+cu118
[conda] Could not collect
And HIP does work for me, i even have HIP 5.7 SDK installed.