ComfyUI Add basic zluda support

Adds basic zluda support to comfy, most of the fixes are from automatic that recently also added experimental zluda support: https://github.com/vladmandic/automatic/wiki/ZLUDA

Refer to: https://github.com/comfyanonymous/ComfyUI/issues/2810#issuecomment-1950283265

Feb 18 '24 18:02 LeagueRaINi

Does this solution longer work? After following the steps, I receive this error:

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

The issue is not the lack of the AMD HIP SDK being installed as was the issue mentioned here: #2810 (comment)

It works. I just followed all the steps listed in the REAMDME.md in this pull request. First, make the changes to comfy/model_management.py and cuda_malloc.py.

After you installed the AMD HIP SDK did you make sure it was added to the windows PATH environment variable?

Did you also remember to download Zluda and also add it to Path?

The next step I did was navigate to python_embeded\Scripts and run python -m pip install --force-reinstall torch==2.2.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Basically, I just followed all the steps listed in the REAMDME.md changes in this pull request.

Apr 29 '24 07:04 Geeknasty

CUDA error: operation not supported CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I've been getting this error, too. I also tried compiling pytorch myself, as the ZLUDA docs mention, and no dice. Also with torch 2.4.0, 2.5.0, and cu121 (and all permutations therein). I get this error running the default gen. (I also tried compiling it with that flag it mentions -- no diff)

May 06 '24 18:05 Lightsockie

I have followed the methods mentioned here PRECISELY, and this DOES NOT work. For one, the file model_management.py is different than what is pulled. I made the edits and I'm certain it is working because it prints Detected ZLUDA, support for it is experimental and comfy may not work properly.

I really don't know what else to do.

Well neither do we unless u tell us what exactly isn't working.

May 08 '24 01:05 LeagueRaINi

I have followed the methods mentioned here PRECISELY, and this DOES NOT work. For one, the file model_management.py is different than what is pulled. I made the edits and I'm certain it is working because it prints Detected ZLUDA, support for it is experimental and comfy may not work properly. I really don't know what else to do.

Well neither do we unless u tell us what exactly isn't working.

After multiple install attempts, this is the error that it repeatedly throws:

Install manually then open cmd go to comfyui-zluda folder Python -m venv venv activate venv by :: venv\scripts\activate (enter) pip install -r requirements.txt pip uninstall torch torchvision -y pip install torch==2.2.0 torchvision --index-url https://download.pytorch.org/whl/cu118

Then delete zluda folder if it is there & run patchzluda.bat

Check the if three dll with correct sizes are there inside torch lib

If everything worked until this point, you can run

I have deleted the ComfyUI folder, created a new environment, ensured that AMD HIP is installed and in the PATH, and I followed your recommendations precisely. And running with --lowvram produced the same persistent error:

Error occurred when executing CheckpointLoaderSimple:

CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\execution.py", line 151, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\execution.py", line 81, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\execution.py", line 74, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\nodes.py", line 516, in load_checkpoint
out = comfy.sd.load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=True, embedding_directory=folder_paths.get_folder_paths("embeddings"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\comfy\sd.py", line 473, in load_checkpoint_guess_config
model = model_config.get_model(sd, "model.diffusion_model.", device=inital_load_device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\comfy\supported_models_base.py", line 60, in get_model
out = model_base.BaseModel(self, model_type=self.model_type(state_dict, prefix), device=device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\comfy\model_base.py", line 62, in __init__
self.diffusion_model = unet_model(**unet_config, device=device, operations=operations)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\comfy\ldm\modules\diffusionmodules\openaimodel.py", line 491, in __init__
operations.Linear(model_channels, time_embed_dim, dtype=self.dtype, device=device),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Files\stable_diffusion\releases\ComfyUI-Zluda\venv\Lib\site-packages\torch\nn\modules\linear.py", line 98, in __init__
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Flor clarity, this is the console output after running python main.py.

C:\Files\stable_diffusion\releases\ComfyUI-Zluda>python main.py
Total VRAM 16368 MB, total RAM 65447 MB
Set vram state to: NORMAL_VRAM
***--------------------------------ZLUDA------------------------------------***
Detected ZLUDA, support for it is experimental and comfy may not work properly.
Disabling cuDNN because ZLUDA does currently not support it.
Disabling flash because ZLUDA does currently not support it.
Enabling math_sdp.
Disabling mem_efficient_sdp because ZLUDA does currently not support it.
***-------------------------------------------------------------------------***
Device: cuda:0 AMD Radeon RX 6800 XT [ZLUDA] : cudaMallocAsync
VAE dtype: torch.bfloat16
Using pytorch cross attention
Adding extra search path checkpoints C:\Files\stable_diffusion\models/checkpoints/
Adding extra search path clip C:\Files\stable_diffusion\models/clip/
Adding extra search path clip_vision C:\Files\stable_diffusion\models/clip_vision/
Adding extra search path configs C:\Files\stable_diffusion\models/configs/
Adding extra search path controlnet C:\Files\stable_diffusion\models/controlnet/
Adding extra search path embeddings C:\Files\stable_diffusion\models/embeddings/
Adding extra search path loras C:\Files\stable_diffusion\models/loras/
Adding extra search path upscale_models C:\Files\stable_diffusion\models/upscale_models/
Adding extra search path vae C:\Files\stable_diffusion\models/vae/

Import times for custom nodes:
   0.0 seconds: C:\Files\stable_diffusion\releases\ComfyUI-Zluda\custom_nodes\websocket_image_save.py

Starting server

To see the GUI go to: http://127.0.0.1:8188

This PR has nothing to do with the fork of the guy above, make an issue there. But he's trying to copy files that do not exist in his bat script unless he fixed that.

May 08 '24 02:05 LeagueRaINi

I've been testing this for a couple of weeks now - This is an absolute game changer for us AMD people - it works really well on my AMD based system. Very little crashes. Only thing I've observed is I need to reboot my machine to do other things like games and stuff. I'm not a dev, but it feels like something gets stuck in the cards ram and crashes the next app I try to run, after shutting ZLUDA driven ComfyUI Down.

but good work - It's fantastic!

May 16 '24 21:05 lord-lethris

I've been testing this for a couple of weeks now - This is an absolute game changer for us AMD people - it works really well on my AMD based system. Very little crashes. Only thing I've observed is I need to reboot my machine to do other things like games and stuff. I'm not a dev, but it feels like something gets stuck in the cards ram and crashes the next app I try to run, after shutting ZLUDA driven ComfyUI Down.

but good work - It's fantastic!

--lowvram seems to have 0 degradation and keeps eveything running smooth if you cap or close to max out your vram

Jun 17 '24 09:06 unclemusclez

Same as @lord-lethris for me, using it with zluda thanks to @LeagueRaINi since 2 to 3 months now after trying to run on CPU then Direct-ML, it's wonderful.

I don't think I need to reboot my computer after running generations on my RX7900XTX, I don't use --lowram or other options, I can use OBS and work after.

Jun 18 '24 09:06 slhad

I think the latest version of ComfyUI has broke it :( I'm now getting this error at the start of the KSampler/render:

CUDA error: named symbol not found
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

If I execute with "--disable-xformers", it starts to render - but I get between 20%-80% of the KSampler/render and then the whole system freezes and I need to hard-reset.

Any Ideas?

Jun 26 '24 16:06 lord-lethris

I think the latest version of ComfyUI has broke it :( I'm now getting this error at the start of the KSampler/render:
CUDA error: named symbol not found
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
If I execute with "--disable-xformers", it starts to render - but I get between 20%-80% of the KSampler/render and then the whole system freezes and I need to hard-reset.

Any Ideas?

I managed to fix this by downgrading ZLUDA to 3.7, Torch to 2.2.1+cu118 and xFormers to 0.0.25 : (Although xFormers is disabled in Command line, some modules force-use it or won't run unless its present, so this is the correct one for torch 2.2.1 so those modules will continue to run)

pip install --force-reinstall torch==2.2.1 torchvision torchaudio xFormers==0.0.25 --index-url https://download.pytorch.org/whl/cu118
pip install onnxruntime-gpu==1.18.1 onnxruntime==1.18.1 onnx==1.16.1

(I am aware I have all the ONNX in there, and that maybe overkill - but I've found its the same as xFormers, some modules wont run without them being present, even if they are not all used)

then tweaking the Start-up batch file:

set PYTHON=
set GIT=
set TORCH_CUDA_ARCH_LIST="6.1+PTX"
set CUDAARCHS=61
set CMAKE_CUDA_ARCHITECTURES=61
set USE_SYSTEM_NCCL=1
set USE_EXPERIMENTAL_CUDNN_V8_API=OFF
set CUDA_LAUNCH_BLOCKING=1
set DISABLE_ADDMM_CUDA_LT=1

python main.py --verbose --enable-cors-header '*' --disable-xformers --use-quad-cross-attention --force-fp32 --disable-smart-memory --lowvram

I still cannot stress how good ZLUDA has been.

I ran a test - Same workflow, same seed.

DirectML Render Time:

Loading 1 new model 100%|███████████████████████████| 20/20 [00:24<00:00, 1.20s/it] Requested to load AutoencoderKL Loading 1 new model Prompt executed in 88.92 seconds

Zuda Render Time:

Loading 1 new model 100%|███████████████████████████| 20/20 [00:14<00:00, 1.38it/s] Requested to load AutoencoderKL Loading 1 new model Prompt executed in 20.37 seconds

ZLUDA quite literally renders at a ¼ if the time it takes DirectML

Jul 01 '24 13:07 lord-lethris

print("Device:", torch_device_name) shouldn't be logging.info("Device: %s", torch_device_name) ?

i ran a test with Rocm 6.1.2 and zluda 3.8, it works fine. good job!

Jul 13 '24 20:07 cdkeito

I'm interested in this too, @comfyanonymous I apologize for the ping, but whats the usual consensus for when a PR is ready to merge? I cant test it atm, but I'm going to get a friend to merge it on his local comfy and see how well it performs, anything we find will talk bout it over here

Aug 10 '24 15:08 G-370

I'm interested in this too, @comfyanonymous I apologize for the ping, but whats the usual consensus for when a PR is ready to merge? I cant test it atm, but I'm going to get a friend to merge it on his local comfy and see how well it performs, anything we find will talk bout it over here

This works but there are a few other things that still need to be sorted out like some nodes enabling cudnn again, i "fixed" that on a different branch on my fork by injecting code into each node but its ugly and i would not want to push that, the instructions also need a rewrite

Aug 10 '24 15:08 LeagueRaINi

So I am using a 5700xt. SDNext works great with zluda. When I follow the readme steps, using the same venv as SDNext, one of 2 errors occurs.
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx This is if I do not change the "cusparse64_11.dll".
OSError: [WinError 126] The specified module could not be found. Error loading "D:\Games\SDNext\automatic\venv\lib\site-packages\torch\lib\cusparse64_11.dll" or one of its dependencies. If I do change the cusparse...dll.

Any Ideas on what is happening here. FYI if I change the cusparse SDNext no longer works either.

Aug 16 '24 18:08 KungFuFurniture

So I am using a 5700xt. SDNext works great with zluda. When I follow the readme steps, using the same venv as SDNext, one of 2 errors occurs. RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx This is if I do not change the "cusparse64_11.dll". OSError: [WinError 126] The specified module could not be found. Error loading "D:\Games\SDNext\automatic\venv\lib\site-packages\torch\lib\cusparse64_11.dll" or one of its dependencies. If I do change the cusparse...dll.

Any Ideas on what is happening here. FYI if I change the cusparse SDNext no longer works either.

I was having this problem. The solution was to replace the ZLUDA with a version compatible with the ROCm that I have. (Turned out that there's 2 version of v3.8)

On the other hand, I'm always getting the CUDA error: out of memory error immediately when reaching the KSampler (if using the preview method TAESD, else it just closes the server), even with --lowvram. If I use the argument --disable-cuda-malloc, only the "Preview: None" works. I'm using a RX580 2048SP 8GB, which managed to work with SD.Next.

Error Log

   !!! Exception during processing!!! CUDA error: out of memory
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "E:\ComfyUI\execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "E:\ComfyUI\execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "E:\ComfyUI\execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "E:\ComfyUI\nodes.py", line 1382, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "E:\ComfyUI\nodes.py", line 1352, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "E:\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\sample_error_enhancer.py", line 22, in informative_sample
    raise e
  File "E:\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\sample_error_enhancer.py", line 9, in informative_sample
    return original_sample(*args, **kwargs)  # This code helps interpret error messages that occur within exceptions but does not have any impact on other operations.
  File "E:\ComfyUI\comfy\sample.py", line 43, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "E:\ComfyUI\custom_nodes\ComfyUI_smZNodes\smZNodes.py", line 1447, in KSampler_sample
    return _KSampler_sample(*args, **kwargs)
  File "E:\ComfyUI\comfy\samplers.py", line 829, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "E:\ComfyUI\custom_nodes\ComfyUI_smZNodes\smZNodes.py", line 1470, in sample
    return _sample(*args, **kwargs)
  File "E:\ComfyUI\comfy\samplers.py", line 729, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "E:\ComfyUI\comfy\samplers.py", line 716, in sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "E:\ComfyUI\comfy\samplers.py", line 695, in inner_sample
    samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
  File "E:\ComfyUI\comfy\samplers.py", line 600, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
  File "C:\Users\T-GAMER\miniconda3\envs\comfyui\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "E:\ComfyUI\comfy\k_diffusion\sampling.py", line 146, in sample_euler
    callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigma_hat, 'denoised': denoised})
  File "E:\ComfyUI\comfy\samplers.py", line 598, in <lambda>
    k_callback = lambda x: callback(x["i"], x["denoised"], x["x"], total_steps)
  File "E:\ComfyUI\latent_preview.py", line 94, in callback
    preview_bytes = previewer.decode_latent_to_preview_image(preview_format, x0)
  File "E:\ComfyUI\latent_preview.py", line 29, in decode_latent_to_preview_image
    preview_image = self.decode_latent_to_preview(x0)
  File "E:\ComfyUI\latent_preview.py", line 38, in decode_latent_to_preview
    return preview_to_image(x_sample)
  File "E:\ComfyUI\latent_preview.py", line 20, in preview_to_image
    latents_ubyte = latents_ubyte.to(device="cpu", dtype=torch.uint8, non_blocking=comfy.model_management.device_supports_non_blocking(latent_image.device))
RuntimeError: CUDA error: out of memory
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Aug 17 '24 17:08 tisThivas

I was having this problem. The solution was to replace the ZLUDA with a version compatible with the ROCm that I have. (Turned out that there's 2 version of v3.8)

I am having the same problem, and even downloading the correct zluda don't work https://github.com/lshqqytiger/ZLUDA/releases/tag/rel.11cc5844514f93161e0e74387f04e2c537705a82

still getting the same errors

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx This is if I do not change the "cusparse64_11.dll". OSError: [WinError 126] The specified module could not be found. Error loading "D:\Games\SDNext\automatic\venv\lib\site-packages\torch\lib\cusparse64_11.dll" or one of its dependencies. If I do change the cusparse...dll.

I am also using this command to initialize the venv C:..\automatic\venv\Scripts\python.exe .\main.py

Aug 17 '24 18:08 brunoCreator

ComfyUI ComfyUI copied to clipboard

Add basic zluda support

ComfyUI
ComfyUI copied to clipboard