ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

HIP error: invalid device function when running ComfyUI

Open byjove01 opened this issue 1 year ago • 20 comments

I'm on Arch Linux 6.7.4-arch1-1 but also using a Python virtual environment to run ComfyUI. My GPU is a Radeon RX 5700 XT and my CPU is a Ryzen 5 3600.

HIP error: invalid device function
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

  File "/opt/ComfyUI/execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/nodes.py", line 56, in encode
    cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/comfy/sd.py", line 128, in encode_from_tokens
    cond, pooled = self.cond_stage_model.encode_token_weights(tokens)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/comfy/sd1_clip.py", line 514, in encode_token_weights
    out, pooled = getattr(self, self.clip).encode_token_weights(token_weight_pairs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/comfy/sd1_clip.py", line 39, in encode_token_weights
    out, pooled = self.encode(to_encode)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/comfy/sd1_clip.py", line 190, in encode
    return self(tokens)
           ^^^^^^^^^^^^
  File "/opt/ComfyUI/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1529, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/comfy/sd1_clip.py", line 172, in forward
    outputs = self.transformer(tokens, attention_mask, intermediate_output=self.layer_idx, final_layer_norm_intermediate=self.layer_norm_hidden_state)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1529, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/comfy/clip_model.py", line 131, in forward
    return self.text_model(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1529, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/comfy/clip_model.py", line 97, in forward
    x = self.embeddings(input_tokens)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1529, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/comfy/clip_model.py", line 80, in forward
    return self.token_embedding(input_tokens) + self.position_embedding.weight
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1529, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/venv/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 163, in forward
    return F.embedding(
           ^^^^^^^^^^^^
  File "/opt/ComfyUI/venv/lib/python3.11/site-packages/torch/nn/functional.py", line 2264, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

byjove01 avatar Feb 19 '24 20:02 byjove01

Hi, did you solve this?

nataliameira avatar Feb 20 '24 18:02 nataliameira

Hi, did you solve this?

Sadly no.

byjove01 avatar Feb 20 '24 22:02 byjove01

Hi, did you solve this?

Sadly no.

I managed to install it on my machine using some tutorials on the internet. Would you like to see them? Maybe this will help you.

nataliameira avatar Feb 21 '24 17:02 nataliameira

Why not! :)

byjove01 avatar Feb 22 '24 17:02 byjove01

Did you set the environment variable HSA_OVERRIDE_GFX_VERSION=10.3.0 in order to override the gpu target for ROCm? 5700XT is not officially supported but you can try to make it work in that fashion.

See also ComfyUI launch instructions: image

Th3Rom3 avatar Feb 22 '24 19:02 Th3Rom3

Here's what I got after setting this environment variable up.

Error occurred when executing CheckpointLoaderSimple:

HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.


  File "/opt/ComfyUI/execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/nodes.py", line 552, in load_checkpoint
    out = comfy.sd.load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=True, embedding_directory=folder_paths.get_folder_paths("embeddings"))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/comfy/sd.py", line 461, in load_checkpoint_guess_config
    model = model_config.get_model(sd, "model.diffusion_model.", device=inital_load_device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/comfy/supported_models_base.py", line 51, in get_model
    out = model_base.BaseModel(self, model_type=self.model_type(state_dict, prefix), device=device)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/comfy/model_base.py", line 51, in __init__
    self.diffusion_model = UNetModel(**unet_config, device=device, operations=operations)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 807, in __init__
    zero_module(operations.conv_nd(dims, model_channels, out_channels, 3, padding=1, dtype=self.dtype, device=device)),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/ComfyUI/comfy/ldm/modules/diffusionmodules/util.py", line 255, in zero_module
    p.detach().zero_()

byjove01 avatar Feb 23 '24 14:02 byjove01

Just to check, which version of pytorch are you running? pip show torch

querying the ROCm issues also shows some other environment variables that might work in that conjunction. But no guarantee

PYTORCH_ROCM_ARCH="gfx1031"
HSA_OVERRIDE_GFX_VERSION=10.3.1
HIP_VISIBLE_DEVICES=0
ROCM_PATH=/opt/rocm

Th3Rom3 avatar Feb 23 '24 15:02 Th3Rom3

pip show torch =>

Name: torch
Version: 2.3.0.dev20240219+rocm6.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /opt/ComfyUI/venv/lib/python3.11/site-packages
Requires: filelock, fsspec, jinja2, networkx, pytorch-triton-rocm, sympy, typing-extensions
Required-by: torchaudio, torchsde, torchvision

byjove01 avatar Feb 23 '24 15:02 byjove01

If you also need my rocminfo :

=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 5 3600 6-Core Processor  
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 5 3600 6-Core Processor  
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3600                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            12                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    16320652(0xf9088c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16320652(0xf9088c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16320652(0xf9088c) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1010                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 5700 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      4096(0x1000) KB                    
  Chip ID:                 29471(0x731f)                      
  ASIC Revision:           2(0x2)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2100                               
  BDFID:                   10240                              
  Internal Node ID:        1                                  
  Compute Unit:            40                                 
  SIMDs per CU:            2                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    1280(0x500)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8372224(0x7fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1010:xnack-  
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

byjove01 avatar Feb 23 '24 15:02 byjove01

Have you tried downgrading to pytorch 2.2 with rocm5.7? You can run this even with ROCm 6.0 binaries installed on the host.

pip uninstall torch torchvision torchaudio pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7

Th3Rom3 avatar Feb 23 '24 15:02 Th3Rom3

It's now even worse. Clicking on the Queue button doesn't show anything so I have to watch my command console to get the error.

ERROR:root:!!! Exception during processing !!!
ERROR:root:Traceback (most recent call last):
File "/opt/ComfyUI/execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/ComfyUI/execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/ComfyUI/execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/ComfyUI/nodes.py", line 552, in load_checkpoint
out = comfy.sd.load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=True, embedding_directory=folder_paths.get_folder_paths("embeddings"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/ComfyUI/comfy/sd.py", line 461, in load_checkpoint_guess_config
model = model_config.get_model(sd, "model.diffusion_model.", device=inital_load_device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/ComfyUI/comfy/supported_models_base.py", line 51, in get_model
out = model_base.BaseModel(self, model_type=self.model_type(state_dict, prefix), device=device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/ComfyUI/comfy/model_base.py", line 51, in __init__
self.diffusion_model = UNetModel(**unet_config, device=device, operations=operations)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 807, in __init__
zero_module(operations.conv_nd(dims, model_channels, out_channels, 3, padding=1, dtype=self.dtype, device=device)),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/ComfyUI/comfy/ldm/modules/diffusionmodules/util.py", line 255, in zero_module
p.detach().zero_()
RuntimeError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.


Prompt executed in 0.06 seconds

byjove01 avatar Feb 23 '24 16:02 byjove01

No idea?

byjove01 avatar Mar 03 '24 11:03 byjove01

did you find a solution ? I running into the same issue

mikafill avatar Mar 03 '24 17:03 mikafill

Sadly not...

byjove01 avatar Mar 03 '24 17:03 byjove01

To my knowledge the problem lies within the fact that current torch releases are compiled with hardware features that RDNA1 does not support. You might have to sequentially downgrade to a pytorch version that was compiled with support for a gfx1010 hardware target or compile your own. If I remember correctly it was some release of torch 2.0 which has been compiled against ROCm 5.3. or 5.2.

Of course this might break support on other ends. But it might be worth a try.

The override variable basically tells the backend that you have a different GPU that it actually is in order to allow it to run. But if the pytorch package now tries to use functions that are not supported it will error out. To the detriment of ML usability AMD has made a lot of changes to the hardware since Polaris when compared to the CUDA stack of Nvidia which is a lot more mature due to it being around a lot longer.

I definitely recall running an early version of Stable Diffusion on my old 5700XT a long time ago.

Th3Rom3 avatar Mar 04 '24 01:03 Th3Rom3

I will try to build Pytorch myself in order to avoid any other compatibility issues. Just in the sake of curiosity, why aren't 5700XT (gfx1010) no longer supported anyways? It's definitely not that old of a graphic card.

EDIT: Tried building myself after installing dependencies, creating a new Python virtual environment and a 20Gb swapfile to prevent the crapton of linking to freeze my computer. But I got this error output. log.txt

byjove01 avatar Mar 14 '24 07:03 byjove01

Made it work using this

ddeityy avatar Mar 19 '24 22:03 ddeityy

Okay, because building Pytorch was tedious and RAM-greedy, I installed the binaries from the ROCm5.3 PyTorch repositories and it apparently works... I mean, the interface is rendered, and the error window doesn't longer appear. But I noticed it couldn't initialize NVML and it weirdly considers my Radeon to be a CUDA thing, I guess... ? I still cannot generate anything because the queue seems to get stuck on the Clip Text Encode node (it's highlighted in green).

Total VRAM 8176 MB, total RAM 15938 MB
Set vram state to: NORMAL_VRAM
/opt/pytorch-gfx1010-venv/lib/python3.11/site-packages/torch/cuda/__init__.py:546: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
Device: cuda:0 AMD Radeon RX 5700 XT : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model_type EPS
adm 0
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
missing {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Requested to load SD1ClipModel
Loading 1 new model

@ddeityy It didn't work for me. My Python env says it's "not compatible with this platform".

byjove01 avatar Mar 20 '24 14:03 byjove01

No more ideas? :(

byjove01 avatar Mar 30 '24 06:03 byjove01

same, following

MaxTran96 avatar Mar 31 '24 10:03 MaxTran96

RDNA1 is a tragedy for its ROCm support. The best solution is to buy the new card or compile the ROCm libraries by yourself.

For Debian user, the support for gfx1010 is enabled in Trixie's libraries (by Debian community), but Trixie is a unstable distribution. https://salsa.debian.org/rocm-team/community/team-project/-/wikis/Supported-GPU-list#gfx1010

supersonictw avatar May 07 '24 10:05 supersonictw

To use custom AMD ROCm libraries, it's better to compile PyTorch on your own. The new PyTorch for ROCm already includes the official ROCm library, it's might be affected if you are trying your own ROCm libraries.

I have no RX5700 card so I can't help you. But the tips might could help you. Cheer up! :mechanical_arm:

supersonictw avatar May 07 '24 10:05 supersonictw

You can try some docker images, it might be help! :smiley:

supersonictw avatar May 07 '24 11:05 supersonictw

Thanks for the tips, I'll try my best.

byjove01 avatar May 07 '24 20:05 byjove01

Thank you all. This is a very helpful thread. I managed to run my script on my AMD Radeon RX 7700S on EndeavourOS (Arch) by running: HSA_OVERRIDE_GFX_VERSION=11.0.0 python script.py

My rocminfo and script are attached. Feel free to use my script to test performance difference between CPU and GPU with torch.

rocm_cuda_test.zip

spirosbond avatar Jun 01 '24 15:06 spirosbond

I can't run comfyUI with HSA_OVERRIDE_GFX_VERSION, a1111 works fine with it

comfyui-rocm  | [Crystools INFO] CPU: AMD Ryzen 7 7700 8-Core Processor - Arch: x86_64 - OS: Linux 6.1.0-18-amd64
comfyui-rocm  | torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 512.00 MiB of which 17179869183.98 GiB is free. Of the allocated memory 414.15 MiB is allocated by PyTorch, and 1.85 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF

grigio avatar Jul 12 '24 13:07 grigio

comfyui-rocm  | torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 512.00 MiB of which 17179869183.98 GiB is free. Of the allocated memory 414.15 MiB is allocated by PyTorch, and 1.85 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF

It's saying torch.cuda.OutOfMemoryError, the only reason might be your model is too big to load on your video card memory.

supersonictw avatar Jul 12 '24 13:07 supersonictw

It's saying torch.cuda.OutOfMemoryError, the only reason might be your model is too big to load on your video card memory.

OK, I'll try to investigate more, I just tried with the default workflow, with a1111 I can run fine SDXL

grigio avatar Jul 12 '24 16:07 grigio

@supersonictw Thanks, i tried SDXL with --lowvram it pass Ksampler but then it fails at VAE Decode like this https://github.com/comfyanonymous/ComfyUI/issues/2431

grigio avatar Jul 12 '24 17:07 grigio