ComfyUI
ComfyUI copied to clipboard
Add basic zluda support
Adds basic zluda support to comfy, most of the fixes are from automatic that recently also added experimental zluda support: https://github.com/vladmandic/automatic/wiki/ZLUDA
Refer to: https://github.com/comfyanonymous/ComfyUI/issues/2810#issuecomment-1950283265
Does this solution longer work? After following the steps, I receive this error:
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
The issue is not the lack of the AMD HIP SDK being installed as was the issue mentioned here: #2810 (comment)
It works. I just followed all the steps listed in the REAMDME.md in this pull request. First, make the changes to comfy/model_management.py and cuda_malloc.py.
After you installed the AMD HIP SDK did you make sure it was added to the windows PATH environment variable?
Did you also remember to download Zluda and also add it to Path?
The next step I did was navigate to python_embeded\Scripts and run python -m pip install --force-reinstall torch==2.2.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Basically, I just followed all the steps listed in the REAMDME.md changes in this pull request.
CUDA error: operation not supported CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.
I've been getting this error, too. I also tried compiling pytorch myself, as the ZLUDA docs mention, and no dice. Also with torch 2.4.0, 2.5.0, and cu121 (and all permutations therein). I get this error running the default gen. (I also tried compiling it with that flag it mentions -- no diff)
I have followed the methods mentioned here PRECISELY, and this DOES NOT work. For one, the file
model_management.py
is different than what is pulled. I made the edits and I'm certain it is working because it printsDetected ZLUDA, support for it is experimental and comfy may not work properly.
I really don't know what else to do.
Well neither do we unless u tell us what exactly isn't working.
I have followed the methods mentioned here PRECISELY, and this DOES NOT work. For one, the file
model_management.py
is different than what is pulled. I made the edits and I'm certain it is working because it printsDetected ZLUDA, support for it is experimental and comfy may not work properly.
I really don't know what else to do.Well neither do we unless u tell us what exactly isn't working.
After multiple install attempts, this is the error that it repeatedly throws:
Install manually then open cmd go to comfyui-zluda folder Python -m venv venv activate venv by :: venv\scripts\activate (enter) pip install -r requirements.txt pip uninstall torch torchvision -y pip install torch==2.2.0 torchvision --index-url https://download.pytorch.org/whl/cu118
Then delete zluda folder if it is there & run patchzluda.bat
Check the if three dll with correct sizes are there inside torch lib
If everything worked until this point, you can run
I have deleted the ComfyUI folder, created a new environment, ensured that AMD HIP is installed and in the PATH, and I followed your recommendations precisely. And running with
--lowvram
produced the same persistent error:Error occurred when executing CheckpointLoaderSimple: CUDA error: operation not supported CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\nodes.py", line 516, in load_checkpoint out = comfy.sd.load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=True, embedding_directory=folder_paths.get_folder_paths("embeddings")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\comfy\sd.py", line 473, in load_checkpoint_guess_config model = model_config.get_model(sd, "model.diffusion_model.", device=inital_load_device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\comfy\supported_models_base.py", line 60, in get_model out = model_base.BaseModel(self, model_type=self.model_type(state_dict, prefix), device=device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\comfy\model_base.py", line 62, in __init__ self.diffusion_model = unet_model(**unet_config, device=device, operations=operations) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\comfy\ldm\modules\diffusionmodules\openaimodel.py", line 491, in __init__ operations.Linear(model_channels, time_embed_dim, dtype=self.dtype, device=device), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\venv\Lib\site-packages\torch\nn\modules\linear.py", line 98, in __init__ self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Flor clarity, this is the console output after running
python main.py
.C:\Files\stable_diffusion\releases\ComfyUI-Zluda>python main.py Total VRAM 16368 MB, total RAM 65447 MB Set vram state to: NORMAL_VRAM ***--------------------------------ZLUDA------------------------------------*** Detected ZLUDA, support for it is experimental and comfy may not work properly. Disabling cuDNN because ZLUDA does currently not support it. Disabling flash because ZLUDA does currently not support it. Enabling math_sdp. Disabling mem_efficient_sdp because ZLUDA does currently not support it. ***-------------------------------------------------------------------------*** Device: cuda:0 AMD Radeon RX 6800 XT [ZLUDA] : cudaMallocAsync VAE dtype: torch.bfloat16 Using pytorch cross attention Adding extra search path checkpoints C:\Files\stable_diffusion\models/checkpoints/ Adding extra search path clip C:\Files\stable_diffusion\models/clip/ Adding extra search path clip_vision C:\Files\stable_diffusion\models/clip_vision/ Adding extra search path configs C:\Files\stable_diffusion\models/configs/ Adding extra search path controlnet C:\Files\stable_diffusion\models/controlnet/ Adding extra search path embeddings C:\Files\stable_diffusion\models/embeddings/ Adding extra search path loras C:\Files\stable_diffusion\models/loras/ Adding extra search path upscale_models C:\Files\stable_diffusion\models/upscale_models/ Adding extra search path vae C:\Files\stable_diffusion\models/vae/ Import times for custom nodes: 0.0 seconds: C:\Files\stable_diffusion\releases\ComfyUI-Zluda\custom_nodes\websocket_image_save.py Starting server To see the GUI go to: http://127.0.0.1:8188
This PR has nothing to do with the fork of the guy above, make an issue there. But he's trying to copy files that do not exist in his bat script unless he fixed that.
I've been testing this for a couple of weeks now - This is an absolute game changer for us AMD people - it works really well on my AMD based system. Very little crashes. Only thing I've observed is I need to reboot my machine to do other things like games and stuff. I'm not a dev, but it feels like something gets stuck in the cards ram and crashes the next app I try to run, after shutting ZLUDA driven ComfyUI Down.
but good work - It's fantastic!
I've been testing this for a couple of weeks now - This is an absolute game changer for us AMD people - it works really well on my AMD based system. Very little crashes. Only thing I've observed is I need to reboot my machine to do other things like games and stuff. I'm not a dev, but it feels like something gets stuck in the cards ram and crashes the next app I try to run, after shutting ZLUDA driven ComfyUI Down.
but good work - It's fantastic!
--lowvram seems to have 0 degradation and keeps eveything running smooth if you cap or close to max out your vram
Same as @lord-lethris for me, using it with zluda thanks to @LeagueRaINi since 2 to 3 months now after trying to run on CPU then Direct-ML, it's wonderful.
I don't think I need to reboot my computer after running generations on my RX7900XTX, I don't use --lowram or other options, I can use OBS and work after.
I think the latest version of ComfyUI has broke it :( I'm now getting this error at the start of the KSampler/render:
CUDA error: named symbol not found
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
If I execute with "--disable-xformers", it starts to render - but I get between 20%-80% of the KSampler/render and then the whole system freezes and I need to hard-reset.
Any Ideas?
I think the latest version of ComfyUI has broke it :( I'm now getting this error at the start of the KSampler/render:
CUDA error: named symbol not found Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
If I execute with "--disable-xformers", it starts to render - but I get between 20%-80% of the KSampler/render and then the whole system freezes and I need to hard-reset.
Any Ideas?
I managed to fix this by downgrading ZLUDA to 3.7, Torch to 2.2.1+cu118 and xFormers to 0.0.25 : (Although xFormers is disabled in Command line, some modules force-use it or won't run unless its present, so this is the correct one for torch 2.2.1 so those modules will continue to run)
pip install --force-reinstall torch==2.2.1 torchvision torchaudio xFormers==0.0.25 --index-url https://download.pytorch.org/whl/cu118
pip install onnxruntime-gpu==1.18.1 onnxruntime==1.18.1 onnx==1.16.1
(I am aware I have all the ONNX in there, and that maybe overkill - but I've found its the same as xFormers, some modules wont run without them being present, even if they are not all used)
then tweaking the Start-up batch file:
set PYTHON=
set GIT=
set TORCH_CUDA_ARCH_LIST="6.1+PTX"
set CUDAARCHS=61
set CMAKE_CUDA_ARCHITECTURES=61
set USE_SYSTEM_NCCL=1
set USE_EXPERIMENTAL_CUDNN_V8_API=OFF
set CUDA_LAUNCH_BLOCKING=1
set DISABLE_ADDMM_CUDA_LT=1
python main.py --verbose --enable-cors-header '*' --disable-xformers --use-quad-cross-attention --force-fp32 --disable-smart-memory --lowvram
I still cannot stress how good ZLUDA has been.
I ran a test - Same workflow, same seed.
DirectML Render Time:
Loading 1 new model 100%|███████████████████████████| 20/20 [00:24<00:00, 1.20s/it] Requested to load AutoencoderKL Loading 1 new model Prompt executed in 88.92 seconds
Zuda Render Time:
Loading 1 new model 100%|███████████████████████████| 20/20 [00:14<00:00, 1.38it/s] Requested to load AutoencoderKL Loading 1 new model Prompt executed in 20.37 seconds
ZLUDA quite literally renders at a ¼ if the time it takes DirectML
print("Device:", torch_device_name) shouldn't be logging.info("Device: %s", torch_device_name) ?
i ran a test with Rocm 6.1.2 and zluda 3.8, it works fine. good job!
I'm interested in this too, @comfyanonymous I apologize for the ping, but whats the usual consensus for when a PR is ready to merge? I cant test it atm, but I'm going to get a friend to merge it on his local comfy and see how well it performs, anything we find will talk bout it over here
I'm interested in this too, @comfyanonymous I apologize for the ping, but whats the usual consensus for when a PR is ready to merge? I cant test it atm, but I'm going to get a friend to merge it on his local comfy and see how well it performs, anything we find will talk bout it over here
This works but there are a few other things that still need to be sorted out like some nodes enabling cudnn again, i "fixed" that on a different branch on my fork by injecting code into each node but its ugly and i would not want to push that, the instructions also need a rewrite
So I am using a 5700xt. SDNext works great with zluda. When I follow the readme steps, using the same venv as SDNext, one of 2 errors occurs.
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
This is if I do not change the "cusparse64_11.dll".
OSError: [WinError 126] The specified module could not be found. Error loading "D:\Games\SDNext\automatic\venv\lib\site-packages\torch\lib\cusparse64_11.dll" or one of its dependencies.
If I do change the cusparse...dll.
Any Ideas on what is happening here. FYI if I change the cusparse SDNext no longer works either.
So I am using a 5700xt. SDNext works great with zluda. When I follow the readme steps, using the same venv as SDNext, one of 2 errors occurs.
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
This is if I do not change the "cusparse64_11.dll".OSError: [WinError 126] The specified module could not be found. Error loading "D:\Games\SDNext\automatic\venv\lib\site-packages\torch\lib\cusparse64_11.dll" or one of its dependencies.
If I do change the cusparse...dll.Any Ideas on what is happening here. FYI if I change the cusparse SDNext no longer works either.
I was having this problem. The solution was to replace the ZLUDA with a version compatible with the ROCm that I have. (Turned out that there's 2 version of v3.8)
On the other hand, I'm always getting the CUDA error: out of memory
error immediately when reaching the KSampler (if using the preview method TAESD
, else it just closes the server), even with --lowvram
. If I use the argument --disable-cuda-malloc
, only the "Preview: None" works. I'm using a RX580 2048SP 8GB, which managed to work with SD.Next.
Error Log
!!! Exception during processing!!! CUDA error: out of memory
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
File "E:\ComfyUI\execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "E:\ComfyUI\execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "E:\ComfyUI\execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "E:\ComfyUI\nodes.py", line 1382, in sample
return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
File "E:\ComfyUI\nodes.py", line 1352, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
File "E:\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\sample_error_enhancer.py", line 22, in informative_sample
raise e
File "E:\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\sample_error_enhancer.py", line 9, in informative_sample
return original_sample(*args, **kwargs) # This code helps interpret error messages that occur within exceptions but does not have any impact on other operations.
File "E:\ComfyUI\comfy\sample.py", line 43, in sample
samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "E:\ComfyUI\custom_nodes\ComfyUI_smZNodes\smZNodes.py", line 1447, in KSampler_sample
return _KSampler_sample(*args, **kwargs)
File "E:\ComfyUI\comfy\samplers.py", line 829, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "E:\ComfyUI\custom_nodes\ComfyUI_smZNodes\smZNodes.py", line 1470, in sample
return _sample(*args, **kwargs)
File "E:\ComfyUI\comfy\samplers.py", line 729, in sample
return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
File "E:\ComfyUI\comfy\samplers.py", line 716, in sample
output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
File "E:\ComfyUI\comfy\samplers.py", line 695, in inner_sample
samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
File "E:\ComfyUI\comfy\samplers.py", line 600, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
File "C:\Users\T-GAMER\miniconda3\envs\comfyui\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\ComfyUI\comfy\k_diffusion\sampling.py", line 146, in sample_euler
callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigma_hat, 'denoised': denoised})
File "E:\ComfyUI\comfy\samplers.py", line 598, in <lambda>
k_callback = lambda x: callback(x["i"], x["denoised"], x["x"], total_steps)
File "E:\ComfyUI\latent_preview.py", line 94, in callback
preview_bytes = previewer.decode_latent_to_preview_image(preview_format, x0)
File "E:\ComfyUI\latent_preview.py", line 29, in decode_latent_to_preview_image
preview_image = self.decode_latent_to_preview(x0)
File "E:\ComfyUI\latent_preview.py", line 38, in decode_latent_to_preview
return preview_to_image(x_sample)
File "E:\ComfyUI\latent_preview.py", line 20, in preview_to_image
latents_ubyte = latents_ubyte.to(device="cpu", dtype=torch.uint8, non_blocking=comfy.model_management.device_supports_non_blocking(latent_image.device))
RuntimeError: CUDA error: out of memory
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I was having this problem. The solution was to replace the ZLUDA with a version compatible with the ROCm that I have. (Turned out that there's 2 version of v3.8)
I am having the same problem, and even downloading the correct zluda don't work https://github.com/lshqqytiger/ZLUDA/releases/tag/rel.11cc5844514f93161e0e74387f04e2c537705a82
still getting the same errors
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
This is if I do not change the "cusparse64_11.dll".OSError: [WinError 126] The specified module could not be found. Error loading "D:\Games\SDNext\automatic\venv\lib\site-packages\torch\lib\cusparse64_11.dll" or one of its dependencies.
If I do change the cusparse...dll.
I am also using this command to initialize the venv C:..\automatic\venv\Scripts\python.exe .\main.py