VAE memory management smart offload doesn't release enough VRAM for processing
Custom Node Testing
- [ ] I have tried disabling custom nodes and the issue persists (see how to disable custom nodes if you need help)
Expected Behavior
When processing the decoder VAE, the model memory unit needs to leave extra VRAM free.
Actual Behavior
Generally the VAE loader doesn't leave enough VRAM available for the VAE to decode the latent samples, and causes GPU VRAM to page to DRAM, slowing down immensely. I can manually bypass the issue by adding a Model Unloader third party block to unload the UNET model manually before VAE processing.
Steps to Reproduce
Currently running Hunyuan Video 1.5 I2V 480p on 4090
Debug Logs
Requested to load AutoencodingEngine
Unloaded partially: 7973.90 MB freed, 7907.86 MB remains loaded, 96.05 MB buffer reserved, lowvram patches: 0
loaded completely; 3333.51 MB usable, 2408.48 MB loaded, full load: True
Other
The model is about 15GB of VRAM, and it unloads about half. Unloading all 15 GB reduces the processing time by about 2-3 minutes (from 289 sec to 124 sec)
A fix for this was merged this morning. Please pull the latest git or let us know after 3.77. Thanks for your report.
@rattus128 can you help me in solving bug
Seems to work great now, thanks!