Memory Optimization issue since "Better Flux vram estimation" update
Expected Behavior
Being able to render
Actual Behavior
The PC slows down at the KSampler step and nothing happens. If i wait a while, a memory allocation error message appears (RTX4090 / 64Gb RAM)
Steps to Reproduce
Simply by updating ComfyUI. The problem appears at commit 47da42d9283815a58636bd6b42c0434f70b24c9c
Debug Logs
Nothing as nothing happens
Other
Using KSamplerAdvancedProgress //Inspire with AYS Scheduler with LCM sampler.
No response
To use Flux you need the [ SamplerCustomAdvanced ] node
https://comfyanonymous.github.io/ComfyUI_examples/flux/
To use Flux you need the [ SamplerCustomAdvanced ] node
https://comfyanonymous.github.io/ComfyUI_examples/flux/
I don't use Flux
OK, I see, ... so this happens on other models, after the "Better Flux vram estimation" update.
I just tried the KSampler Advanced Progress (Inspire) node, with SD1.5, and it works for me. I'm on Windows 10.
Thank you @JorgeR81 . So maybe the scheduler is the problem, i'm using AYO 1.5 with lcm :
I'm using 2 of them, the second is to upscale the first pass. This is the first one, where the problem's coming. Maybe there is a wrong thing in the Comfy update that doesnt match with AYS scheduler or specifically with the Inspire node ?
Can you test if things are improved now?
Can you test if things are improved now?
I'm in the middle of a render, i tell you this in a few minutes and will edit this message, thank you !
Can you test if things are improved now?
Still not, looks like an OOM error :
https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/sampler_helpers.py#L64
If you change:
memory_required = model.memory_required([noise_shape[0] * 2] + list(noise_shape[1:])) + inference_memory
to:
memory_required = model.memory_required([noise_shape[0] * 4] + list(noise_shape[1:])) + inference_memory
Does that fix it?
Going to test but i notice a weird thing. If i revert to the right ComfyUI commit (for me it's 17bbd83176268c76a8597bb3a88768d325536651), but without revert to the right Inspire Pack commit too, i also get an error :
memory_required = model.memory_required([noise_shape[0] * 4] + list(noise_shape[1:])) + inference_memory
And yes the same OOM error appears with this tweak too
As mentioned here after many tests https://github.com/ltdrdata/ComfyUI-Inspire-Pack/issues/135 :
Everything works fine when I have these commits:
ComfyUI: 17bbd83176268c76a8597bb3a88768d325536651 Inspire pack: https://github.com/ltdrdata/ComfyUI-Inspire-Pack/commit/cf9bae0718c42077722a66269d6b4f2424b255c2
Updates following these commits crash the KSamplerAdvancedProgress //Inspire node when using AYS Scheduler with LCM sampler. Hope it helps, i revert back to the right commits waiting a fix. Thank you!
It's kinda fun to have Flux cause issues for people not even inferencing it. Everything was fine before it was implemented. Oh well.
There has been some optimizations to the lowvram mode which should speed things up for most people. If you have issues you need to post your hardware and which model you are using.
Cascade is still 1.5-2 seconds per iteration slower on DirectML on a 4GB RX 570. I don't expect a meaningful fix because DirectML support feels a bit like an afterthought at this point and requires a line change in model_management.py ( lowvram_available = False #TODO: need to find a way to get free memory in directml before this can be enabled by default., setting this to True made 4GB seemingly work for everything ranging from SD, SDXL, SD3, and Cascade with performance comparable to ROCM on Linux if not slightly better) to even work on 4GB. ROCM on Linux was already slower in general before the changes. I'll just chalk it as user error. It probably is unreasonable to expect low-end hardware edge cases to be supported.
Edit: This issue was not present up to a6decf1e620907347c9c5d8c815172f349b19c21, I think it may be something unrelated to the lowvram changes. Trying to pin down what commit the problems actually start at.
Edit2: https://github.com/comfyanonymous/ComfyUI/commit/d420bc792af0b61a6ef7410c65fa2d4dcc646c56 is the exact commit that I start having speed degradation on Cascade.
There has been some optimizations to the lowvram mode which should speed things up for most people. If you have issues you need to post your hardware and which model you are using.
I use an RTX4090 + 64gb RAM. Before the update, everything worked perfectly. Until yesterday, on my current project I could render batches between 200 and 250 frames (depending on the number of cn I was using) at a resolution of 1256x712. If i update I can't even render 50 frames sadly. The problem i mention is not visible if you only render one frame, it concerns videos actually maybe i should have mention it.
Does the change I made you do above have any effect on the number of frames you can do?
Does the change I made you do above have any effect on the number of frames you can do?
No, i tested them all and they are no effect, PC barely freeze and i've got the same OOM message.
Edit2: d420bc7 is the exact commit that I start having speed degradation on Cascade.
My speed and OOM issue goes away on latest commit simply reverting model_base.py to https://github.com/comfyanonymous/ComfyUI/raw/1589b58d3e29e44623a1f3f595917b98f2301c3e/comfy/model_base.py
AFAIK my issues weren't caused at all by the lowvram changes in model_management.py, just d420bc792af0b61a6ef7410c65fa2d4dcc646c56
Not sure if this could be the culprit of the problem in this issue thread specifically, but I narrowed it down in my own situation. More specifically I can revert 2 lines and my problem goes away while still remaining on the latest commit.
area = input_shape[0] * math.prod(input_shape[2:])
- return (area * comfy.model_management.dtype_size(dtype) * 0.01 * self.memory_usage_factor) * (1024 * 1024)
+ return (area * comfy.model_management.dtype_size(dtype) / 50) * (1024 * 1024)
else:
#TODO: this formula might be too aggressive since I tweaked the sub-quad and split algorithms to use less memory.
area = input_shape[0] * math.prod(input_shape[2:])
- return (area * 0.15 * self.memory_usage_factor) * (1024 * 1024)
+ return (((area * 0.6) / 0.9) + 1024) * (1024 * 1024)
I did try adding memory_usage_factors for Cascade in supported_models.py but couldn't achieve much outside preventing OOMs but still having the speed problem.
Nice you solved your problem, unfortunately it doenst work the problem here. Thanks for sharing though !
EDIT : after many new tests following all different repo updates, ComfyUI doesn't seem to be the problem anymore. The problem apparently comes from Impact pack. Waiting for an answer from ltdrdata to get confirmation before closing this thread.
I have the same question. Run the Comfuyi alone to solve the problem.