ComfyUI
ComfyUI copied to clipboard
Force use of VAE decode (tiled)
Feature Idea
ROCm for me atleast on RX 7800XT is quite unstable during the VAE decode step. That step seems to ignore the --reserve-vram argument and if the vram hits the roof the GPU will get GPU hang of some unknown reason.
One workaround would be to always use the tiled VAE decode so the vram usage never spikes and causes instability.
Thing is that it would be nice to have a setting to always use tiled VAE decode so I don't need to update all my workflows.
Existing Solutions
No response
Other
See ROCm issue over here: https://github.com/ROCm/ROCm/issues/3580
I'm having this issue, and it's a new issue. I've been using ComfyUI for about a year now without having this problem.
When I run an SDXL workflow and VAE Decode, it's often saying out of memory and switching to tiled decode. This is new behavior. I have 16Gb VRAM, and I'm decoding 1024x1024 latents. Usually, it will still decode after the first run, but on subsequent runs, it gets stuck and won't complete the decode.
6900XT, Ubuntu 24.04, torch 2.6.0.dev20241004+rocm6.2
I'm having this issue, and it's a new issue. I've been using ComfyUI for about a year now without having this problem.
When I run an SDXL workflow and VAE Decode, it's often saying out of memory and switching to tiled decode. This is new behavior. I have 16Gb VRAM, and I'm decoding 1024x1024 latents. Usually, it will still decode after the first run, but on subsequent runs, it gets stuck and won't complete the decode.
6900XT, Ubuntu 24.04, torch 2.6.0.dev20241004+rocm6.2
I did a smaller reproducing case https://github.com/ROCm/ROCm/issues/3580#issuecomment-2461073403 I tried stepping back 100 commits and tried again and if it still crashed I reverted back even more. I got back to around may last year before some missing python dependency stopped me to go further back.
So in short, it seems that if it was working before it's not something that ComfyUI has broken in their code.
My most qualified guess is that the issue is somewhere in the AMD driver, they have some ticked internally for the bug. It's a shame that all work is done behind doors so we random noobs have no idea when and if any fix will be out.
There's a "VAE Decode (Tiled)" node that you can use if you want to always use tiled decode.
There's a "VAE Decode (Tiled)" node that you can use if you want to always use tiled decode.
Yes, I know about that one, this ticket was more for getting some config or toggle to always use the tiled node so I don't need to remember to swap out the node if I reuse an old image with the non-tiled node
Just tested running a Hunyuan Video workflow on an 7900XTX. Unfortunately, it hung on the vae decode (tiled) node.
Also trying to get a successful generation with a hunyuan video workflow on a 7900XT (Ubuntu WSL). Generation always seems to get stuck on VAE decode (tiled and default) and stay there with no further console output. It was a first-time generation so I let it run for a couple of hours, but unfortunately no success.
pytorch version: 2.7.0.dev20250215+rocm6.3
Also trying to get a successful generation with a hunyuan video workflow on a 7900XT (Ubuntu WSL). Generation always seems to get stuck on VAE decode (tiled and default) and stay there with no further console output. It was a first-time generation so I let it run for a couple of hours, but unfortunately no success.
pytorch version: 2.7.0.dev20250215+rocm6.3
What happens if you add the VAE Decode Tiled node manually instead?
You could also check this out if it gets faster: https://github.com/comfyanonymous/ComfyUI/issues/5759#issuecomment-2652490678
Haven't had time to test it yet.
Also trying to get a successful generation with a hunyuan video workflow on a 7900XT (Ubuntu WSL). Generation always seems to get stuck on VAE decode (tiled and default) and stay there with no further console output. It was a first-time generation so I let it run for a couple of hours, but unfortunately no success.
pytorch version: 2.7.0.dev20250215+rocm6.3
Will it work if you use KJ's vae decode node? As for the vae decode (tiled) node, does it still hang if you set 256, 64, 8 ,8?
What happens if you add the VAE Decode Tiled node manually instead?
You could also check this out if it gets faster: #5759 (comment)
Haven't had time to test it yet.
Will it work if you use KJ's vae decode node? As for the vae decode (tiled) node, does it still hang if you set 256, 64, 8 ,8?
Using a tiled VAE decode node with 256, 64, 8, 8 actually enabled me to finally finish a short two second video generation. However it seems the generation process is still very unstable (for me atleast) and I often have to deal with OOMs or the process just getting killed in general (32GB RAM with windows still running beside WSL is probably not ideal). I will try again with a couple of changes and maybe KJ nodes and report back.
Also trying to get a successful generation with a hunyuan video workflow on a 7900XT (Ubuntu WSL). Generation always seems to get stuck on VAE decode (tiled and default) and stay there with no further console output. It was a first-time generation so I let it run for a couple of hours, but unfortunately no success.
pytorch version: 2.7.0.dev20250215+rocm6.3 ———————————————————————————————— I have the same problem using native ubuntu, 7900XT, 2.7.0.dev20250215+rocm6.3
Also trying to get a successful generation with a hunyuan video workflow on a 7900XT (Ubuntu WSL). Generation always seems to get stuck on VAE decode (tiled and default) and stay there with no further console output. It was a first-time generation so I let it run for a couple of hours, but unfortunately no success. pytorch version: 2.7.0.dev20250215+rocm6.3 ———————————————————————————————— I have the same problem using native ubuntu, 7900XT, 2.7.0.dev20250215+rocm6.3
VAE Decode tiled 256 64 64 8 ubuntu 24.04
Before this I also tried ubuntu22.04+ROCm 6.1, same problem
@Kolopsel @maomi2021 Thanks for the information!
I do have a side request though for anyone still here.
Could you do a little test on EasyAnimate v5.1 7b, on ROCm Linux? My suspicion is that it wouldn't work.
If you have Zluda installed on Windows, could you try to see if EasyAnimate will work? This one, I am generally curious and have a bit of hope.
I think this might be related to this issue.
@Kolopsel @maomi2021 Thanks for the information!
I do have a side request though for anyone still here.
Could you do a little test on EasyAnimate v5.1 7b, on ROCm Linux? My suspicion is that it wouldn't work.
If you have Zluda installed on Windows, could you try to see if EasyAnimate will work? This one, I am generally curious and have a bit of hope.
I tried Windows+Zluda, but it couldn't meet my needs. The same video size prompted a prompt of insufficient video memory at the beginning and failed to start (it can run under linux+rocm)