ComfyUI Memory Optimization issue since "Better Flux vram estimation" update

Expected Behavior

Being able to render

Actual Behavior

The PC slows down at the KSampler step and nothing happens. If i wait a while, a memory allocation error message appears (RTX4090 / 64Gb RAM)

Steps to Reproduce

Simply by updating ComfyUI. The problem appears at commit 47da42d9283815a58636bd6b42c0434f70b24c9c

Debug Logs

Nothing as nothing happens

Other

Using KSamplerAdvancedProgress //Inspire with AYS Scheduler with LCM sampler.

No response

Aug 03 '24 12:08 Djanghost

To use Flux you need the [ SamplerCustomAdvanced ] node

https://comfyanonymous.github.io/ComfyUI_examples/flux/

Aug 03 '24 13:08 JorgeR81

To use Flux you need the [ SamplerCustomAdvanced ] node

https://comfyanonymous.github.io/ComfyUI_examples/flux/

I don't use Flux

Aug 03 '24 13:08 Djanghost

OK, I see, ... so this happens on other models, after the "Better Flux vram estimation" update.

I just tried the KSampler Advanced Progress (Inspire) node, with SD1.5, and it works for me. I'm on Windows 10.

insp

Aug 03 '24 13:08 JorgeR81

Thank you @JorgeR81 . So maybe the scheduler is the problem, i'm using AYO 1.5 with lcm : Capture d'écran 2024-08-03 154527

I'm using 2 of them, the second is to upscale the first pass. This is the first one, where the problem's coming. Maybe there is a wrong thing in the Comfy update that doesnt match with AYS scheduler or specifically with the Inspire node ?

Aug 03 '24 13:08 Djanghost

Can you test if things are improved now?

Aug 03 '24 15:08 comfyanonymous

Can you test if things are improved now?

I'm in the middle of a render, i tell you this in a few minutes and will edit this message, thank you !

Aug 03 '24 16:08 Djanghost

Can you test if things are improved now?

Still not, looks like an OOM error : Capture d'écran 2024-08-03 181629

Aug 03 '24 16:08 Djanghost

https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/sampler_helpers.py#L64

If you change:

memory_required = model.memory_required([noise_shape[0] * 2] + list(noise_shape[1:])) + inference_memory

to:

memory_required = model.memory_required([noise_shape[0] * 4] + list(noise_shape[1:])) + inference_memory

Does that fix it?

Aug 03 '24 16:08 comfyanonymous

Going to test but i notice a weird thing. If i revert to the right ComfyUI commit (for me it's 17bbd83176268c76a8597bb3a88768d325536651), but without revert to the right Inspire Pack commit too, i also get an error :

Capture d'écran 2024-08-03 183756

Aug 03 '24 16:08 Djanghost

memory_required = model.memory_required([noise_shape[0] * 4] + list(noise_shape[1:])) + inference_memory

And yes the same OOM error appears with this tweak too

Aug 03 '24 16:08 Djanghost

As mentioned here after many tests https://github.com/ltdrdata/ComfyUI-Inspire-Pack/issues/135 :

Everything works fine when I have these commits:

ComfyUI: 17bbd83176268c76a8597bb3a88768d325536651 Inspire pack: https://github.com/ltdrdata/ComfyUI-Inspire-Pack/commit/cf9bae0718c42077722a66269d6b4f2424b255c2

Updates following these commits crash the KSamplerAdvancedProgress //Inspire node when using AYS Scheduler with LCM sampler. Hope it helps, i revert back to the right commits waiting a fix. Thank you!

Aug 03 '24 17:08 Djanghost

It's kinda fun to have Flux cause issues for people not even inferencing it. Everything was fine before it was implemented. Oh well.

Aug 03 '24 19:08 rabidcopy

There has been some optimizations to the lowvram mode which should speed things up for most people. If you have issues you need to post your hardware and which model you are using.

Aug 03 '24 19:08 comfyanonymous

Cascade is still 1.5-2 seconds per iteration slower on DirectML on a 4GB RX 570. I don't expect a meaningful fix because DirectML support feels a bit like an afterthought at this point and requires a line change in model_management.py ( lowvram_available = False #TODO: need to find a way to get free memory in directml before this can be enabled by default., setting this to True made 4GB seemingly work for everything ranging from SD, SDXL, SD3, and Cascade with performance comparable to ROCM on Linux if not slightly better) to even work on 4GB. ROCM on Linux was already slower in general before the changes. I'll just chalk it as user error. It probably is unreasonable to expect low-end hardware edge cases to be supported.

Edit: This issue was not present up to a6decf1e620907347c9c5d8c815172f349b19c21, I think it may be something unrelated to the lowvram changes. Trying to pin down what commit the problems actually start at.

Edit2: https://github.com/comfyanonymous/ComfyUI/commit/d420bc792af0b61a6ef7410c65fa2d4dcc646c56 is the exact commit that I start having speed degradation on Cascade.

Aug 03 '24 19:08 rabidcopy

There has been some optimizations to the lowvram mode which should speed things up for most people. If you have issues you need to post your hardware and which model you are using.

I use an RTX4090 + 64gb RAM. Before the update, everything worked perfectly. Until yesterday, on my current project I could render batches between 200 and 250 frames (depending on the number of cn I was using) at a resolution of 1256x712. If i update I can't even render 50 frames sadly. The problem i mention is not visible if you only render one frame, it concerns videos actually maybe i should have mention it.

Aug 03 '24 19:08 Djanghost

Does the change I made you do above have any effect on the number of frames you can do?

Aug 03 '24 21:08 comfyanonymous

Does the change I made you do above have any effect on the number of frames you can do?

No, i tested them all and they are no effect, PC barely freeze and i've got the same OOM message.

Aug 03 '24 21:08 Djanghost

Edit2: d420bc7 is the exact commit that I start having speed degradation on Cascade.

My speed and OOM issue goes away on latest commit simply reverting model_base.py to https://github.com/comfyanonymous/ComfyUI/raw/1589b58d3e29e44623a1f3f595917b98f2301c3e/comfy/model_base.py

AFAIK my issues weren't caused at all by the lowvram changes in model_management.py, just d420bc792af0b61a6ef7410c65fa2d4dcc646c56

Not sure if this could be the culprit of the problem in this issue thread specifically, but I narrowed it down in my own situation. More specifically I can revert 2 lines and my problem goes away while still remaining on the latest commit.

             area = input_shape[0] * math.prod(input_shape[2:])
-            return (area * comfy.model_management.dtype_size(dtype) * 0.01 * self.memory_usage_factor) * (1024 * 1024)
+            return (area * comfy.model_management.dtype_size(dtype) / 50) * (1024 * 1024)
         else:
             #TODO: this formula might be too aggressive since I tweaked the sub-quad and split algorithms to use less memory.
             area = input_shape[0] * math.prod(input_shape[2:])
-            return (area * 0.15 * self.memory_usage_factor) * (1024 * 1024)
+            return (((area * 0.6) / 0.9) + 1024) * (1024 * 1024)

I did try adding memory_usage_factors for Cascade in supported_models.py but couldn't achieve much outside preventing OOMs but still having the speed problem.

Aug 04 '24 01:08 rabidcopy

Nice you solved your problem, unfortunately it doenst work the problem here. Thanks for sharing though !

Aug 04 '24 02:08 Djanghost

EDIT : after many new tests following all different repo updates, ComfyUI doesn't seem to be the problem anymore. The problem apparently comes from Impact pack. Waiting for an answer from ltdrdata to get confirmation before closing this thread.

Aug 04 '24 03:08 Djanghost

I have the same question. Run the Comfuyi alone to solve the problem.

Aug 08 '24 17:08 zhangifonly