ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

VAE memory management smart offload doesn't release enough VRAM for processing

Open theqmann opened this issue 1 month ago • 2 comments

Custom Node Testing

Expected Behavior

When processing the decoder VAE, the model memory unit needs to leave extra VRAM free.

Actual Behavior

Generally the VAE loader doesn't leave enough VRAM available for the VAE to decode the latent samples, and causes GPU VRAM to page to DRAM, slowing down immensely. I can manually bypass the issue by adding a Model Unloader third party block to unload the UNET model manually before VAE processing.

Steps to Reproduce

Currently running Hunyuan Video 1.5 I2V 480p on 4090

Debug Logs

Requested to load AutoencodingEngine
Unloaded partially: 7973.90 MB freed, 7907.86 MB remains loaded, 96.05 MB buffer reserved, lowvram patches: 0
loaded completely; 3333.51 MB usable, 2408.48 MB loaded, full load: True

Other

The model is about 15GB of VRAM, and it unloads about half. Unloading all 15 GB reduces the processing time by about 2-3 minutes (from 289 sec to 124 sec)

theqmann avatar Dec 03 '25 08:12 theqmann

A fix for this was merged this morning. Please pull the latest git or let us know after 3.77. Thanks for your report.

rattus128 avatar Dec 04 '25 23:12 rattus128

@rattus128 can you help me in solving bug

Vijay2359 avatar Dec 05 '25 12:12 Vijay2359

Seems to work great now, thanks!

theqmann avatar Dec 05 '25 18:12 theqmann