Running ComfyUI with XFX Radeon RX470 GPU
Hi, is it possible to run ComfyUI with a XFX Radeon RX470 GPU on Windows 10 ?
Is works fine with CPU, but when I try to run the non-CPU version, it of course gives me a "Found no NVIDIA driver on your system"
E:\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py
Traceback (most recent call last):
File "E:\ComfyUI_windows_portable\ComfyUI\main.py", line 76, in <module>
import execution
File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 13, in <module>
import nodes
File "E:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 20, in <module>
import comfy.diffusers_load
File "E:\ComfyUI_windows_portable\ComfyUI\comfy\diffusers_load.py", line 4, in <module>
import comfy.sd
File "E:\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 5, in <module>
from comfy import model_management
File "E:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 118, in <module>
total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
^^^^^^^^^^^^^^^^^^
File "E:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 87, in get_torch_device
return torch.device(torch.cuda.current_device())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\cuda\__init__.py", line 769, in current_device
_lazy_init()
File "E:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\cuda\__init__.py", line 298, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
So I follow the "Manual Install (Windows, Linux) / Others / DirectML (AMD Cards on Windows) and try to install DirectML
ComfyUI_windows_portable> pip install torch-directml
ERROR: Could not find a version that satisfies the requirement torch-directml (from versions: none)
ERROR: No matching distribution found for torch-directml
and of course running doesn't work
I tried installing with the
pip install torch===2.1.0 torchvision===0.16.0 -f https://download.pytorch.org/whl/torch_stable.html
but it still doesn't work
E:\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --directml
Traceback (most recent call last):
File "E:\ComfyUI_windows_portable\ComfyUI\main.py", line 76, in <module>
import execution
File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 13, in <module>
import nodes
File "E:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 20, in <module>
import comfy.diffusers_load
File "E:\ComfyUI_windows_portable\ComfyUI\comfy\diffusers_load.py", line 4, in <module>
import comfy.sd
File "E:\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 5, in <module>
from comfy import model_management
File "E:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 37, in <module>
import torch_directml
ModuleNotFoundError: No module named 'torch_directml'
Should I do something else to be able to run it on Windows 10 with the RX470 ?
?
Which version of python are you running? If I recall correctly 'torch-directml' only has support for python 3.10.x at the newest.
You might have to adjust your python setup.
Which version of python are you running? If I recall correctly 'torch-directml' only has support for python 3.10.x at the newest.
You might have to adjust your python setup.
I was using the embeded Python 3.11 that is inside the "python_embeded" folder.
Now I've installed Python 3.10 and reinstalled the requirements and it runs.
But when I try to create an image it gives me this error
Error occurred when executing CheckpointLoaderSimple:
Could not allocate tensor with 6553600 bytes. There is not enough GPU video memory available!
This is my GPU
@brunoaduarte Please keep us updated if you have success. I have a ye olde 470 gathering dust, would be nice to put it to use!
It doesn't work, even with --lowvram parameter (which is pretty bad already).
E:\ComfyUI_windows_portable>python.exe -s ComfyUI\main.py --windows-standalone-build --directml --use-split-cross-attention --lowvram
Using directml with device:
Total VRAM 1024 MB, total RAM 81839 MB
Set vram state to: LOW_VRAM
Device: privateuseone
VAE dtype: torch.float32
Using split optimization for cross attention
Starting server
To see the GUI go to: http://127.0.0.1:8188
got prompt
model_type EPS
adm 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
missing {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Requested to load SDXLClipModel
Loading 1 new model
Requested to load AutoencoderKL
Loading 1 new model
loading in lowvram mode 64.0
Requested to load SDXL
Loading 1 new model
loading in lowvram mode 64.0
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:32<00:00, 6.42s/it]
Requested to load AutoencoderKL
Loading 1 new model
loading in lowvram mode 64.0
ERROR:root:!!! Exception during processing !!!
ERROR:root:Traceback (most recent call last):
File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 154, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 84, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "E:\ComfyUI_windows_portable\ComfyUI\execution.py", line 77, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "E:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 267, in decode
return (vae.decode(samples["samples"]), )
File "E:\ComfyUI_windows_portable\ComfyUI\comfy\sd.py", line 244, in decode
pixel_samples[x:x+batch_number] = torch.clamp((self.first_stage_model.decode(samples).to(self.output_device).float() + 1.0) / 2.0, min=0.0, max=1.0)
File "E:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\models\autoencoder.py", line 202, in decode
dec = self.decoder(dec, **decoder_kwargs)
File "C:\Users\Myself\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\model.py", line 635, in forward
h = self.up[i_level].block[i_block](h, temb, **kwargs)
File "C:\Users\Myself\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\model.py", line 140, in forward
h = self.norm1(h)
File "C:\Users\Myself\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "E:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 71, in forward
return self.forward_comfy_cast_weights(*args, **kwargs)
File "E:\ComfyUI_windows_portable\ComfyUI\comfy\ops.py", line 67, in forward_comfy_cast_weights
return torch.nn.functional.group_norm(input, self.num_groups, weight, bias, self.eps)
File "C:\Users\Myself\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\functional.py", line 2530, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Could not allocate tensor with 603979776 bytes. There is not enough GPU video memory available!
Prompt executed in 42.12 seconds
Bummer! Also, I should have asked, is it the 4GB or 8GB variety you're trying with?
There is not enough GPU video memory available!
I managed to help someone get started with Fooocus today on a laptop with a gtx 1050 which only has 4GB (maybe even 3GB) VRAM, so I'm guessing that even the 4GB version should be enough in VRAM terms. No experience with ROCm but I'm tempted to take a shot at getting it set up on a spare machine. If I make any progress I'll let you know.
As far as I can tell from the short log you are trying to run an SDXL model.
Did you check if it works with SD 1.5 to begin with? SDXL can be challenging to run in 4GB VRAM.
If SD 1.5 at 512x512 image size works well you can go from there.
Additional options to get SDXL working are tiling and potentially reduced models like FP8.
Just a thought.. As mentioned I managed to get Fooocus, which uses SDXL, working on a colleague's gtx1050 laptop yesterday.
It might be an idea to test quickly with Fooocus, as it has many optimizations applied for low VRAM cards and supports AMD 'out of the box'. That might actually give you the fastest and easiest approach to establishing if SDXL is possible on the card without further troubleshooting in ComfyUI.
For 4Gb I recommend you only run sd1.5 models, pruned at fp16 (they are the ones at 1.99Gb on CivitAI). Also try starting ComfyUI with "--fp16-unet", it should reduce the Vram use considerably.
Another thing: Lots of nodes are said to be Nvidia/CUDA only, but are actually runnable just fine, but you have to change .to("cuda") on files for .to("privateuseone") or .to(device) if the var is already there taking your directml from comfy management. Almost no node/plugin is actually CUDA only, but it can be a headache to adjust. In a few cases I had to cast the tensors to the CPU and back to the GPU, as there are some operations directml can't run, like those in FreeU node.
Sincerely, someone with a Rx580 (8Gb) that doesn't even dare trying to use SDXL, Sd1.5 is already 'slow enough'...
If you have an AMD GPU use ZLUDA it makes it possible to use CUDA on AMD cards.
@brunoaduarte: Well, I seem to be late on my response; maybe you already figured it out? Try to start u'r ComfyUI with option '--lowvram', like: ' python(3) main.py --lowvram' This is the only way I get it run/created pics with my GTX1050Ti w/ only 4GBit of VRAM. Regards, Roger
@brunoaduarte Please keep us updated if you have success. I have a ye olde 470 gathering dust, would be nice to put it to use!
Oh, I am not alone?