ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

RAM cache implementation - part II

Open rattus128 opened this issue 2 months ago • 4 comments

This PR expands the robustness of the RAM cache implementation. This makes the RAM cache much friendlier to use and avoids users needing to specifically size the cache based on their workflow. It also avoids OOMs in more cases, especially flows with multiple large models. There are three key changes:

1: Loosing the executors pre-emptive cache pin on models so that cached models late in a workflow can be freed to make space for earlier ones 2: pre-emptively freeing space for large models on load 3: freeing space on demand during the GPU -> RAM weight offload process

Example test conditions:

Linux, RTX5090, swapoff, 96GB RAM Workflow: Flux FP16 -> qwen FP16 -> wan 2.2 FP16 giant-flow.json

giant-flow-scr

In the screenshot its executing wan. The RAM trace shows it dropping down from 95% to make space for wan after qwen.

On rerun it still has all the text encodings for re-use.

rattus128 avatar Nov 18 '25 01:11 rattus128

Hey, is this PR in a state where it can be taken off draft + reviewed, or is it still in the oven?

Kosinkadink avatar Nov 22 '25 04:11 Kosinkadink

Hey, is this PR in a state where it can be taken off draft + reviewed, or is it still in the oven?

Hey, we are stuck on draft as it conflicts with async offloading and I will need to do a small rebase and retest. Feel free to review though.

rattus128 avatar Nov 22 '25 04:11 rattus128

There's a bug where I enable ram cache on simulated 50GB ram + 24 vram.

I run this workflow twice in a row:

ComfyUI_277027_

It unloads the high noise model on the first workflow run which is good but the second time it gets stuck on the first sampler node.

comfyanonymous avatar Dec 03 '25 04:12 comfyanonymous

There's a bug where I enable ram cache on simulated 50GB ram + 24 vram.

I run this workflow twice in a row: ComfyUI_277027_

It unloads the high noise model on the first workflow run which is good but the second time it gets stuck on the first sampler node.

should be able to unload low noise model on the second run properly now.

rattus128 avatar Dec 19 '25 12:12 rattus128