ComfyUI Force use of VAE decode (tiled)

Feature Idea

ROCm for me atleast on RX 7800XT is quite unstable during the VAE decode step. That step seems to ignore the --reserve-vram argument and if the vram hits the roof the GPU will get GPU hang of some unknown reason.

One workaround would be to always use the tiled VAE decode so the vram usage never spikes and causes instability.

Thing is that it would be nice to have a setting to always use tiled VAE decode so I don't need to update all my workflows.

Existing Solutions

No response

Other

See ROCm issue over here: https://github.com/ROCm/ROCm/issues/3580

Oct 24 '24 01:10 hartmark

I'm having this issue, and it's a new issue. I've been using ComfyUI for about a year now without having this problem.

When I run an SDXL workflow and VAE Decode, it's often saying out of memory and switching to tiled decode. This is new behavior. I have 16Gb VRAM, and I'm decoding 1024x1024 latents. Usually, it will still decode after the first run, but on subsequent runs, it gets stuck and won't complete the decode.

6900XT, Ubuntu 24.04, torch 2.6.0.dev20241004+rocm6.2

Oct 24 '24 15:10 ricperry

I'm having this issue, and it's a new issue. I've been using ComfyUI for about a year now without having this problem.

When I run an SDXL workflow and VAE Decode, it's often saying out of memory and switching to tiled decode. This is new behavior. I have 16Gb VRAM, and I'm decoding 1024x1024 latents. Usually, it will still decode after the first run, but on subsequent runs, it gets stuck and won't complete the decode.

6900XT, Ubuntu 24.04, torch 2.6.0.dev20241004+rocm6.2

I did a smaller reproducing case https://github.com/ROCm/ROCm/issues/3580#issuecomment-2461073403 I tried stepping back 100 commits and tried again and if it still crashed I reverted back even more. I got back to around may last year before some missing python dependency stopped me to go further back.

So in short, it seems that if it was working before it's not something that ComfyUI has broken in their code.

My most qualified guess is that the issue is somewhere in the AMD driver, they have some ticked internally for the bug. It's a shame that all work is done behind doors so we random noobs have no idea when and if any fix will be out.

Nov 12 '24 22:11 hartmark

There's a "VAE Decode (Tiled)" node that you can use if you want to always use tiled decode.

Nov 13 '24 00:11 comfyanonymous

There's a "VAE Decode (Tiled)" node that you can use if you want to always use tiled decode.

Yes, I know about that one, this ticket was more for getting some config or toggle to always use the tiled node so I don't need to remember to swap out the node if I reuse an old image with the non-tiled node

Nov 13 '24 00:11 hartmark

Just tested running a Hunyuan Video workflow on an 7900XTX. Unfortunately, it hung on the vae decode (tiled) node.

Feb 14 '25 04:02 doogyhatts

Also trying to get a successful generation with a hunyuan video workflow on a 7900XT (Ubuntu WSL). Generation always seems to get stuck on VAE decode (tiled and default) and stay there with no further console output. It was a first-time generation so I let it run for a couple of hours, but unfortunately no success.

pytorch version: 2.7.0.dev20250215+rocm6.3

Feb 15 '25 19:02 Kolopsel

Also trying to get a successful generation with a hunyuan video workflow on a 7900XT (Ubuntu WSL). Generation always seems to get stuck on VAE decode (tiled and default) and stay there with no further console output. It was a first-time generation so I let it run for a couple of hours, but unfortunately no success.

pytorch version: 2.7.0.dev20250215+rocm6.3

What happens if you add the VAE Decode Tiled node manually instead?

You could also check this out if it gets faster: https://github.com/comfyanonymous/ComfyUI/issues/5759#issuecomment-2652490678

Haven't had time to test it yet.

Feb 15 '25 19:02 hartmark

Also trying to get a successful generation with a hunyuan video workflow on a 7900XT (Ubuntu WSL). Generation always seems to get stuck on VAE decode (tiled and default) and stay there with no further console output. It was a first-time generation so I let it run for a couple of hours, but unfortunately no success.

pytorch version: 2.7.0.dev20250215+rocm6.3

Will it work if you use KJ's vae decode node? As for the vae decode (tiled) node, does it still hang if you set 256, 64, 8 ,8?

Feb 16 '25 00:02 doogyhatts

What happens if you add the VAE Decode Tiled node manually instead?

You could also check this out if it gets faster: #5759 (comment)

Haven't had time to test it yet.

Will it work if you use KJ's vae decode node? As for the vae decode (tiled) node, does it still hang if you set 256, 64, 8 ,8?

Using a tiled VAE decode node with 256, 64, 8, 8 actually enabled me to finally finish a short two second video generation. However it seems the generation process is still very unstable (for me atleast) and I often have to deal with OOMs or the process just getting killed in general (32GB RAM with windows still running beside WSL is probably not ideal). I will try again with a couple of changes and maybe KJ nodes and report back.

Feb 16 '25 13:02 Kolopsel

Also trying to get a successful generation with a hunyuan video workflow on a 7900XT (Ubuntu WSL). Generation always seems to get stuck on VAE decode (tiled and default) and stay there with no further console output. It was a first-time generation so I let it run for a couple of hours, but unfortunately no success.

pytorch version: 2.7.0.dev20250215+rocm6.3 ———————————————————————————————— I have the same problem using native ubuntu, 7900XT, 2.7.0.dev20250215+rocm6.3

Feb 16 '25 13:02 testbug5577

Also trying to get a successful generation with a hunyuan video workflow on a 7900XT (Ubuntu WSL). Generation always seems to get stuck on VAE decode (tiled and default) and stay there with no further console output. It was a first-time generation so I let it run for a couple of hours, but unfortunately no success. pytorch version: 2.7.0.dev20250215+rocm6.3 ———————————————————————————————— I have the same problem using native ubuntu, 7900XT, 2.7.0.dev20250215+rocm6.3

VAE Decode tiled 256 64 64 8 ubuntu 24.04

Before this I also tried ubuntu22.04+ROCm 6.1, same problem

Feb 16 '25 13:02 testbug5577

@Kolopsel @maomi2021 Thanks for the information!

I do have a side request though for anyone still here.

Could you do a little test on EasyAnimate v5.1 7b, on ROCm Linux? My suspicion is that it wouldn't work.

If you have Zluda installed on Windows, could you try to see if EasyAnimate will work? This one, I am generally curious and have a bit of hope.

Feb 16 '25 14:02 doogyhatts

I think this might be related to this issue.

Feb 17 '25 16:02 rgwalke2

@Kolopsel @maomi2021 Thanks for the information!

I do have a side request though for anyone still here.

Could you do a little test on EasyAnimate v5.1 7b, on ROCm Linux? My suspicion is that it wouldn't work.

If you have Zluda installed on Windows, could you try to see if EasyAnimate will work? This one, I am generally curious and have a bit of hope.

I tried Windows+Zluda, but it couldn't meet my needs. The same video size prompted a prompt of insufficient video memory at the beginning and failed to start (it can run under linux+rocm)

Feb 18 '25 19:02 testbug5577

ComfyUI ComfyUI copied to clipboard

Force use of VAE decode (tiled)

Feature Idea

Existing Solutions

Other

ComfyUI
ComfyUI copied to clipboard