ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Recent memory changes cause system freeze

Open gitmylo opened this issue 3 weeks ago • 10 comments

Custom Node Testing

Expected Behavior

I expect the vram offloading to offload unused models to cpu ram and unload if it doesn't fit.

Actual Behavior

The model gets offloaded to cpu ram even if it doesn't fit (Shows as 48/48gb cpu ram), which causes my pc to rely on swapfile which causes my pc to freeze.

In earlier versions this seemed to be what happened, now it forces it into cpu ram always.

Steps to Reproduce

I have 16gb vram and 48gb cpu ram and I'm running windows 11

  1. Load a memory intensive workflow (like wan 2.2 a14b for example, in fp8 still works since the text encoder and vae will still be loaded)
  2. Run the workflow and watch the cpu memory usage, memory usage will fill up to 100%, then the pc will slow down dramatically

Previous versions would have unloaded the model, I first noticed this issue yesterday morning.

Debug Logs

The computer freezes so I can't get the logs. Although I doubt anything was logged anyway.

Other

Seems to have started around yesterday morning (somewhere before 5ebcab3c7d974963a89cecd37296a22fdb73bd2b), generating images with large models like wan 2.2 a14b (which originally worked fine in fp8 due to unloading) has become unusable due to the freezes.

gitmylo avatar Nov 12 '25 11:11 gitmylo

It appears to be related to pagefile, originally comfy unloaded models when they didn't fit into ram, now it uses the full ram + pagefile it appears. My pagefile was set to my slow hdd which caused my pc to freeze up, when I changed it to use my nvme m.2 ssd it no longer causes system freezes, still it would be nice to have a launch option or similar to avoid using pagefile when possible.

gitmylo avatar Nov 13 '25 09:11 gitmylo

It appears to be related to pagefile, originally comfy unloaded models when they didn't fit into ram, now it uses the full ram + pagefile it appears. My pagefile was set to my slow hdd which caused my pc to freeze up, when I changed it to use my nvme m.2 ssd it no longer causes system freezes, still it would be nice to have a launch option or similar to avoid using pagefile when possible.

Depending on how big the safetensors are and how much RAM you have the page file was always involved, as far I remember. Because I am stubborn enough to only use fp16 instead of fp8 wan 2.2 models I had to buy an NVMe drive just for page swapping, my 64 GB RAM couldn't handle 2 X 30 GB safetensors (not to mention loras) without swapping heavily. Before that I used to keep the swap file on a SATA SSD but it was a "conflict of interests" because it was loading the models and it was paging to the same drive, which was a suboptimal situation. I never dared to set the page file to be on my mechanical HDD, I would probably still be waiting for the last month prompt to finish if I would have done such a thing. Seeing how long it takes to transfer a 30 GB file from HDD to SSD, or the other way around.

jovan2009 avatar Nov 13 '25 12:11 jovan2009

its more often the problem for windows for freezing thing ,as you can see the linux handles it completed differently , we neet to figure out this thing

Vijay2359 avatar Nov 14 '25 13:11 Vijay2359

its more often the problem for windows for freezing thing ,as you can see the linux handles it completed differently , we neet to figure out this thing

Yeah, probably. I wouldn't know, I never used Linux past trying it out in a shallow way. I'm a Windows user. And yes, windows becomes unresponsive when pagefile is heavily read or written, no matter how many CPU cores you have or how little their usage is, the whole system becomes like the molasses. Especially with a pagefile on a HDD, with SSD is less obvious. Edit: technical term, probably, is "increased latency", latencies that normally are in the range of milliseconds become multiple seconds, so thousands of times more.

jovan2009 avatar Nov 14 '25 15:11 jovan2009

I have the same issue, it started two days ago, before it was fine. But now I'm unable to use my wan 2.2 i2v workflow because the system freezes. I have even tried to set up a fresh new comfyui instance but same behaviour.

s3to87 avatar Nov 14 '25 19:11 s3to87

I have the same issue, it started two days ago, before it was fine. But now I'm unable to use my wan 2.2 i2v workflow because the system freezes. I have even tried to set up a fresh new comfyui instance but same behaviour.

What is your configuration, RAM and disk drives in particular? I didn't notice such thing and I also use wan 2.2 I2V. One general advice that I would give is to not put the pagefile on a HDD but on a SSD (preferably an NVMe, not SATA) and avoid also to have the pagefile assigned to the system disk (the one where windows resides).

Although in general is recommended to let windows manage the pagefile automatically in this heavy RAM requirements that video generation implies my recommendation is to disable the pagefile, if possible, on the drives that are: 1: HDD. 2: The drives you are loading the safetensors from. 3: The system drive. In that order of importance.

Enable pagefile only on the remaining drives.

I realize that it might be impossible in your current computer configuration. I bought an NVMe SSD drive just for the purpose to put the pagefile on it, so you might consider that. Even a SATA SSD drive would probably be a lot better if you can have the pagefile separated from your system drive and from the drive you are keeping the models on, to avoid the situation you are loading safetensors and at the same time you write data to the pagefile on the same disk. Pagefiles on multiple drives, as long you follow the rules above, would be even better. Right now though I have just only one drive that I can use for swap in the way I explained and I can't complain.

jovan2009 avatar Nov 14 '25 19:11 jovan2009

@jovan2009 I was also using the FP16 on 64GB RAM with --cache-none option as startup argument. This is the only way it could run the models in sequence and not both at the same time. I make sure the operation stays between VRAM - RAM only and not using a swapfile.

I'm on Linux, 64GB RAM and loading the models from nvme disk. Also I think Q8 is significantly better than the fp8 (scaled) and provides a near fp-16 quality. So that might be a good alternative to the FP16.

boyan-orion avatar Nov 14 '25 20:11 boyan-orion

@jovan2009 I was also using the FP16 on 64GB RAM with --cache-none option as startup argument. This is the only way it could run the models in sequence and not both at the same time. I make sure the operation stays between VRAM - RAM only and not using a swapfile.

I'm on Linux, 64GB RAM and loading the models from nvme disk. Also I think Q8 is significantly better than the fp8 (scaled) and provides a near fp-16 quality. So that might be a good alternative to the FP16.

I am a "quality" freak, maybe is mostly placebo but the results are still different fp16 vs fp8. I always use fp16 safetensors and I deleted a while ago fp8 versions for "disk space optimization" purposes. I also have 64 GB RAM (I would have bought more if my motherboard could accommodate, but it doesn't). I don't mess with cache or memory settings in ComfyUI's command. I'm not bothered by swapping, I prefer it because my SSD where I keep my models is SATA, so the loading again if it's discarded would be slower than loading the cached information from pagefile, which is on NVMe SSD ( and about, probably at least 4-5 times faster than my SATA, I don't remember the exact numbers). @boyan-orion

Edit: also in Windows there is a feature of the OS, is using memory compression. I have moments in the prompt execution when the compressed memory area reaches 17 GB or more. I think that would not function in the same way if the cache is discarded. I have no idea if there is a similar functionality in Linux.

jovan2009 avatar Nov 14 '25 21:11 jovan2009

I have the same issue, it started two days ago, before it was fine. But now I'm unable to use my wan 2.2 i2v workflow because the system freezes. I have even tried to set up a fresh new comfyui instance but same behaviour.

What is your configuration, RAM and disk drives in particular? I didn't notice such thing and I also use wan 2.2 I2V. One general advice that I would give is to not put the pagefile on a HDD but on a SSD (preferably an NVMe, not SATA) and avoid also to have the pagefile assigned to the system disk (the one where windows resides).

Although in general is recommended to let windows manage the pagefile automatically in this heavy RAM requirements that video generation implies my recommendation is to disable the pagefile, if possible, on the drives that are: 1: HDD. 2: The drives you are loading the safetensors from. 3: The system drive. In that order of importance.

Enable pagefile only on the remaining drives.

I realize that it might be impossible in your current computer configuration. I bought an NVMe SSD drive just for the purpose to put the pagefile on it, so you might consider that. Even a SATA SSD drive would probably be a lot better if you can have the pagefile separated from your system drive and from the drive you are keeping the models on, to avoid the situation you are loading safetensors and at the same time you write data to the pagefile on the same disk. Pagefiles on multiple drives, as long you follow the rules above, would be even better. Right now though I have just only one drive that I can use for swap in the way I explained and I can't complain.

I'm on 32 GB RAM / 12GB Vram, Windows, tried both, Comfy UI Desktop & Portable. But my pagefile was indeed configured to use my HDD Drive, I have just changed it to use my SSD instead and this has improved the situation masisvely, thanks! It's still weird that everything was fine up until two days ago, even with the pagefile on a HDD.

s3to87 avatar Nov 14 '25 21:11 s3to87

Pagefiles on multiple drives, as long you follow the rules above, would be even better.

I would make a correction about my own sentence. Pagefiles on multiple drives would be better as long the drives are of similar speed. Pagefile spread across a fast drive and a slow drive will slow everything down, because if the dumb windows uses both the whole thing will be reduced to the lowest denominator. I don't understand why, but windows seems to act like is unaware (or probably really is) about the speeds of the drives it uses for swap. If you give it two drives, one HDD and one SSD, you just shot yourself in the foot, everything will wait after the HDD.

jovan2009 avatar Nov 15 '25 00:11 jovan2009