ComfyUI RAM cache implementation

This PR expands the robustness of the RAM cache implementation. This makes the RAM cache much friendlier to use and avoids users needing to specifically size the cache based on their workflow. It also avoids OOMs in more cases, especially flows with multiple large models. There are three key changes:

1: Loosing the executors pre-emptive cache pin on models so that cached models late in a workflow can be freed to make space for earlier ones 2: pre-emptively freeing space for large models on load 3: freeing space on demand during the GPU -> RAM weight offload process

Example test conditions:

Linux, RTX5090, swapoff, 96GB RAM Workflow: Flux FP16 -> qwen FP16 -> wan 2.2 FP16 giant-flow.json

In the screenshot its executing wan. The RAM trace shows it dropping down from 95% to make space for wan after qwen.

On rerun it still has all the text encodings for re-use.

Nov 18 '25 01:11 rattus128

Hey, is this PR in a state where it can be taken off draft + reviewed, or is it still in the oven?

Nov 22 '25 04:11 Kosinkadink

Hey, is this PR in a state where it can be taken off draft + reviewed, or is it still in the oven?

Hey, we are stuck on draft as it conflicts with async offloading and I will need to do a small rebase and retest. Feel free to review though.

Nov 22 '25 04:11 rattus128

There's a bug where I enable ram cache on simulated 50GB ram + 24 vram.

I run this workflow twice in a row:

It unloads the high noise model on the first workflow run which is good but the second time it gets stuck on the first sampler node.

Dec 03 '25 04:12 comfyanonymous

There's a bug where I enable ram cache on simulated 50GB ram + 24 vram.

I run this workflow twice in a row:

It unloads the high noise model on the first workflow run which is good but the second time it gets stuck on the first sampler node.

should be able to unload low noise model on the second run properly now.

Dec 19 '25 12:12 rattus128

RAM cache implementation - part II