ComfyUI Above 32 GB RAM usage, when loading Flux models in checkpoint version.

Expected Behavior

Keep RAM usage below the limit, to avoid wearing down my SSD.

Actual Behavior

I have 8GB VRAM and 32 GB RAM I'm on Windows 10

With the full size fp16 models, my RAM usage goes above the limit, when the models need to be loaded. It works, but the available SSD space goes down.

This is normal, I guess, considering the model sizes.

But it also happens with the ( fp8 ) Comfy-org checkpoint models ( 17.2 GB )

Steps to Reproduce

I used the default workflow.

Mode details in this discussion, with task manager images: https://github.com/comfyanonymous/ComfyUI/discussions/4226

Aug 06 '24 08:08 JorgeR81

Same happening here. The model I'm using is only 17.2 GB, but it tries to fill up all my RAM before it even tries to use the GPU. I'm so tired of requirements increasingly exponentially in AI. Feels like it's designed to be used online only so you're a slave to their GPU clusters.

Aug 06 '24 11:08 NoMansPC

It's likely doing some kind of casting up to float32 or 16 and then back down to fp8, even if you're using an fp8 version of the model. It might not be the transformer though, maybe it's doing it for the t5 or something. I haven't actually checked to verify though.

Aug 07 '24 04:08 RandomGitUser321

Here is a summary of my observations, in case it helps.

When I use the fp16 models ( and t5 also in fp16 ):

When the Unet is loading, I run out of RAM, for a moment. But then it goes below the limit again.
Then, when the text encoder is loading, I run out of RAM, again. But also temporarily.
When I'm generating I'm at ~20 GB RAM and ~7.2 VRAM usage.
In idle, after generating, I'm at about ~26 GB RAM and ~1 GB RAM usage.
But if I change the prompt, I will also run out of RAM, temporarily.

With the Comfy-org Flux checkpoint:

When the Checkpoint is loading, I run out of RAM, for a moment. But then it goes below the limit again.
When I'm generating I'm at ~ 14 GB RAM and ~7.2 VRAM usage.
In idle, after generating, I'm at about ~20 GB RAM and ~1 GB VRAM usage.
I can change the prompt, without running out of RAM.

Aug 07 '24 07:08 JorgeR81

Here's some observations from other users, with more RAM. https://github.com/comfyanonymous/ComfyUI/discussions/4173#discussioncomment-10247249 https://github.com/comfyanonymous/ComfyUI/pull/3649#issuecomment-2270469986

Aug 07 '24 08:08 JorgeR81

Yeah I think I was on to something about it upcasting: supported_inference_dtypes = [torch.bfloat16, torch.float32]

https://github.com/comfyanonymous/ComfyUI/blob/1c08bf35b49879115dedd8ec6bc92d9e8d8fd871/comfy/supported_models.py#L631

Aug 07 '24 09:08 RandomGitUser321

Even if fp8 is not possible, just supporting / upcasting to fp16 would be a good improvement. I think now it's probably upcasting to fp32 in all cases, while loading.

The fp16 model is 23.8 GB When the Unet is loading, I start with ~ 4 RAM usage, and I still run out of RAM, even before the text encoder starts loading. This also happens if I set the weight_type to fp8, in the Unet loader node. And even if I start Comfy UI with --force-fp16

Aug 07 '24 13:08 JorgeR81

I got this problem to it blow my ram and swap even if I don't type python main.py --use-split-cross-attention its crash whole ubuntu os. If I run I Stuck at 32 gb ram load 4 gb frozen swap and stuck at .vae and can't gen.

Aug 07 '24 18:08 KEDI103