ComfyUI Running ComfyUI with XFX Radeon RX470 GPU

Hi, is it possible to run ComfyUI with a XFX Radeon RX470 GPU on Windows 10 ?

Is works fine with CPU, but when I try to run the non-CPU version, it of course gives me a "Found no NVIDIA driver on your system"

E:\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py
Traceback (most recent call last):
  File "E:\ComfyUI_windows_portable\ComfyUI\main.py", line 76, in <module>
    import execution
  File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 13, in <module>
    import nodes
  File "E:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 20, in <module>
    import comfy.diffusers_load
  File "E:\ComfyUI_windows_portable\ComfyUI\comfy\diffusers_load.py", line 4, in <module>
    import comfy.sd
  File "E:\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 5, in <module>
    from comfy import model_management
  File "E:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 118, in <module>
    total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
                                  ^^^^^^^^^^^^^^^^^^
  File "E:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 87, in get_torch_device
    return torch.device(torch.cuda.current_device())
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\cuda\__init__.py", line 769, in current_device
    _lazy_init()
  File "E:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\cuda\__init__.py", line 298, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

So I follow the "Manual Install (Windows, Linux) / Others / DirectML (AMD Cards on Windows) and try to install DirectML

ComfyUI_windows_portable> pip install torch-directml
ERROR: Could not find a version that satisfies the requirement torch-directml (from versions: none)
ERROR: No matching distribution found for torch-directml

and of course running doesn't work

I tried installing with the pip install torch===2.1.0 torchvision===0.16.0 -f https://download.pytorch.org/whl/torch_stable.html but it still doesn't work

E:\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --directml
Traceback (most recent call last):
  File "E:\ComfyUI_windows_portable\ComfyUI\main.py", line 76, in <module>
    import execution
  File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 13, in <module>
    import nodes
  File "E:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 20, in <module>
    import comfy.diffusers_load
  File "E:\ComfyUI_windows_portable\ComfyUI\comfy\diffusers_load.py", line 4, in <module>
    import comfy.sd
  File "E:\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 5, in <module>
    from comfy import model_management
  File "E:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 37, in <module>
    import torch_directml
ModuleNotFoundError: No module named 'torch_directml'

Should I do something else to be able to run it on Windows 10 with the RX470 ?

Jan 17 '24 21:01 brunoaduarte

?

Jan 22 '24 17:01 brunoaduarte

Which version of python are you running? If I recall correctly 'torch-directml' only has support for python 3.10.x at the newest.

You might have to adjust your python setup.

Jan 24 '24 02:01 Th3Rom3

Which version of python are you running? If I recall correctly 'torch-directml' only has support for python 3.10.x at the newest.

You might have to adjust your python setup.

I was using the embeded Python 3.11 that is inside the "python_embeded" folder.

Now I've installed Python 3.10 and reinstalled the requirements and it runs.

But when I try to create an image it gives me this error

Error occurred when executing CheckpointLoaderSimple:

Could not allocate tensor with 6553600 bytes. There is not enough GPU video memory available!

This is my GPU

Jan 24 '24 06:01 brunoaduarte

@brunoaduarte Please keep us updated if you have success. I have a ye olde 470 gathering dust, would be nice to put it to use!

Jan 24 '24 14:01 ricardofeynman

It doesn't work, even with --lowvram parameter (which is pretty bad already).

E:\ComfyUI_windows_portable>python.exe -s ComfyUI\main.py --windows-standalone-build --directml --use-split-cross-attention --lowvram
Using directml with device:
Total VRAM 1024 MB, total RAM 81839 MB
Set vram state to: LOW_VRAM
Device: privateuseone
VAE dtype: torch.float32
Using split optimization for cross attention
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model_type EPS
adm 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
missing {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Requested to load SDXLClipModel
Loading 1 new model
Requested to load AutoencoderKL
Loading 1 new model
loading in lowvram mode 64.0
Requested to load SDXL
Loading 1 new model
loading in lowvram mode 64.0
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:32<00:00,  6.42s/it]
Requested to load AutoencoderKL
Loading 1 new model
loading in lowvram mode 64.0
ERROR:root:!!! Exception during processing !!!
ERROR:root:Traceback (most recent call last):
  File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 154, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 84, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 77, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "E:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 267, in decode
    return (vae.decode(samples["samples"]), )
  File "E:\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 244, in decode
    pixel_samples[x:x+batch_number] = torch.clamp((self.first_stage_model.decode(samples).to(self.output_device).float() + 1.0) / 2.0, min=0.0, max=1.0)
  File "E:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\models\autoencoder.py", line 202, in decode
    dec = self.decoder(dec, **decoder_kwargs)
  File "C:\Users\Myself\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\model.py", line 635, in forward
    h = self.up[i_level].block[i_block](h, temb, **kwargs)
  File "C:\Users\Myself\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\model.py", line 140, in forward
    h = self.norm1(h)
  File "C:\Users\Myself\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 71, in forward
    return self.forward_comfy_cast_weights(*args, **kwargs)
  File "E:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 67, in forward_comfy_cast_weights
    return torch.nn.functional.group_norm(input, self.num_groups, weight, bias, self.eps)
  File "C:\Users\Myself\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\functional.py", line 2530, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Could not allocate tensor with 603979776 bytes. There is not enough GPU video memory available!

Prompt executed in 42.12 seconds

Jan 24 '24 15:01 brunoaduarte

Bummer! Also, I should have asked, is it the 4GB or 8GB variety you're trying with?

There is not enough GPU video memory available!

I managed to help someone get started with Fooocus today on a laptop with a gtx 1050 which only has 4GB (maybe even 3GB) VRAM, so I'm guessing that even the 4GB version should be enough in VRAM terms. No experience with ROCm but I'm tempted to take a shot at getting it set up on a spare machine. If I make any progress I'll let you know.

Jan 24 '24 18:01 ricardofeynman

As far as I can tell from the short log you are trying to run an SDXL model.

Did you check if it works with SD 1.5 to begin with? SDXL can be challenging to run in 4GB VRAM.

If SD 1.5 at 512x512 image size works well you can go from there.

Additional options to get SDXL working are tiling and potentially reduced models like FP8.

Jan 25 '24 08:01 Th3Rom3

Just a thought.. As mentioned I managed to get Fooocus, which uses SDXL, working on a colleague's gtx1050 laptop yesterday.

It might be an idea to test quickly with Fooocus, as it has many optimizations applied for low VRAM cards and supports AMD 'out of the box'. That might actually give you the fastest and easiest approach to establishing if SDXL is possible on the card without further troubleshooting in ComfyUI.

Jan 25 '24 09:01 ricardofeynman

For 4Gb I recommend you only run sd1.5 models, pruned at fp16 (they are the ones at 1.99Gb on CivitAI). Also try starting ComfyUI with "--fp16-unet", it should reduce the Vram use considerably.

Another thing: Lots of nodes are said to be Nvidia/CUDA only, but are actually runnable just fine, but you have to change .to("cuda") on files for .to("privateuseone") or .to(device) if the var is already there taking your directml from comfy management. Almost no node/plugin is actually CUDA only, but it can be a headache to adjust. In a few cases I had to cast the tensors to the CPU and back to the GPU, as there are some operations directml can't run, like those in FreeU node.

Sincerely, someone with a Rx580 (8Gb) that doesn't even dare trying to use SDXL, Sd1.5 is already 'slow enough'...

Jan 25 '24 23:01 MythicalChu

If you have an AMD GPU use ZLUDA it makes it possible to use CUDA on AMD cards.

Jul 23 '24 03:07 jangirke

@brunoaduarte: Well, I seem to be late on my response; maybe you already figured it out? Try to start u'r ComfyUI with option '--lowvram', like: ' python(3) main.py --lowvram' This is the only way I get it run/created pics with my GTX1050Ti w/ only 4GBit of VRAM. Regards, Roger

Jan 29 '25 09:01 RogerWeihrauch

@brunoaduarte Please keep us updated if you have success. I have a ye olde 470 gathering dust, would be nice to put it to use!

Oh, I am not alone?

Jul 22 '25 15:07 AlexMorgan3817