Excessive Memory Overcommit on High-RAM Systems
ComfyUI commits significantly more virtual memory than it actually uses on high-RAM systems. This forces users to maintain unnecessarily large pagefiles and can cause OOM crashes despite abundant available physical memory.
With a 20GB pagefile (80GB total virtual memory), ComfyUI uses 45GB actual memory but commits 75GB virtual memory. The application will crash with OOM errors when committed memory hits 80GB/80GB, even though there's still ~30GB of RAM available to use. It's committing 67% more memory than it needs, it's asking for almost twice as much memory as it actually uses.
It looks like around model_management.py:598, there appears to be a 10% safety margin built in, which on a 64GB system should only add ~6GB (or maybe 8GB if calculated against total virtual memory including pagefile). However, the actual overcommit is 30GB - which represents 46% of my total physical RAM. This suggests percentage-based allocation doesn't scale appropriately for high-memory systems.
Edit: specifically I'm seeing this allocation with Wan native comfyui flow.
Expected Behavior
More RAM should reduce memory pressure and overcommit
Actual Behavior
RAM leads to proportionally larger overcommit (67% waste on 64GB system)
System
ComfyUI 0.3.35 ComfyUI_frontend v1.20.4 RAM: 64GB (62GB usable after iGPU allocation) GPU: RTX 3090 24GB VRAM OS: Windows 11
The model_management code only deals with vram not ram. The excessive ram usage is not a comfyui problem.
Where does the issue come from then? The ram gets allocated when I run the model. It seems it doesn't clear out since I believe models unloaded from VRAM are moved into RAM, maybe that isn't deallocating ram properly? I'll do some more tests and see what I can find out.
Seems this is blocked by PyTorch: https://github.com/pytorch/pytorch/issues/12873
It has known issues with massive memory overallocation. I've been paying attention to it lately and it's allocating close to 40gb more than is actually being used, which is a massive issue...
I have the same problem with HiDream but Flux works well.
As soon as the KSampler starts to generate an image, the RAM goes to 100% and the graphics card is hardly used at all. The PC starts to stutter until everything freezes, then it slowly recovers and the image generation stops with an allocation error and the console says "torch.OutOfMemoryError: Allocation on device".
System ComfyUI 0.3.38 ComfyUI_frontend v1.20.7 CPU: Ryzen 9 7950x3D RAM: 64GB GPU: RTX 4090 24GB VRAM OS: Windows 11
I also have problem with KSampler Wan2.2 System Ram is overused and doesn't seem to be cleaned, while VRam is still cleaned after Generate process is complete
I have the same issue too. 48Gb of DRAM
Came here looking for a solution. I'm running a simple wan2.2 4k 5s generation (rtx 5090 + 96GB DDR5) and I hit a wall today where nothing I do will keep system ram utilization from spiking up to 89GB and flagging it for Linux OOM process killer to slay. I ran 50 successful 7s generations at 1MP overnight and they ran without any problem. Something obviously changed on my system that is causing it to peak up past the OOM killer's threshold. I mention these details to say that it doesn't appear to be a leak per se. It just doesn't draw my attention unless the process crashes and it currently can't stay up and complete a job.
With the comfy process crashed, util is 3.28G/92.0G If I drop my 4k upscaler, I can stay under the radar of OOM but I shouldn't have to. When I enter my post-processing stage, I no longer need WAN2.2-I2V-14B taking up 53GB of System RAM. I want to be able to make it go poof and clear that headroom for more processing.
Has anyone noticed any movement on this issue?