ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Avoid CPU OOM by loading diffusion model from state dict with assign

Open strint opened this issue 3 weeks ago • 7 comments

When --mmap-torch-files is enabled, ComfyUI loads .ckpt and .pt files using mmap, significantly reducing CPU memory usage during file loading.

However, during load_model_weights in UNetLoader, the state dict is normally copied from the memory-mapped file into standard CPU memory, negating the benefit for model weights.

By using assign=True, the loader reuses the underlying tensor storage directly from the memory-mapped state dict. This avoids unnecessary copies and preserves the memory savings when loading large models via mmap.

With this improvement, ComfyUI can load multiple large models without causing CPU memory OOM.

strint avatar Dec 12 '25 09:12 strint

This is an awesome potential change for performance and I have had it on the radar for a while.

How does this interact with pinned memory? My understanding is mmaped memory cannot be inplace pinned using the apporach currently taken in pin_memory (cuda_host_register). Does this just error and fallback to no pinning?

This is a disruptive change and the community has actually gone to the effort of explictly offloading mmaps in the past due to poor OS support.

https://github.com/city96/ComfyUI-GGUF/blob/main/nodes.py#L98

This probably should spend some time behind a --fast startup argument for stabilization.

rattus128 avatar Dec 12 '25 12:12 rattus128

How does this interact with pinned memory? My understanding is mmaped memory cannot be inplace pinned using the apporach currently taken in pin_memory (cuda_host_register). Does this just error and fallback to no pinning?

At the diffusion model loading node, it appears that pinned memory is not used. The function load_torch_file loads the model file from disk into a tensor dictionary. Then load_model_weights copies that tensor dictionary into the model’s state dict. Finally, the KSampler node transfers the model state dict to GPU VRAM through load_models_gpu. Based on this flow, it seems that the model parameters never reside in pinned memory.

strint avatar Dec 15 '25 09:12 strint

This is a disruptive change and the community has actually gone to the effort of explictly offloading mmaps in the past due to poor OS support.

https://github.com/city96/ComfyUI-GGUF/blob/main/nodes.py#L98

This probably should spend some time behind a --fast startup argument for stabilization.

You are correct. Safetensors loading uses mmap by default, and there is an argument --disable-mmap to turn this off.

Meanwhile, when mmap loading can be enabled, there is no need to load the tensor dictionary into regular CPU memory before the KSample node uses it, which saves a significant amount of CPU RAM.

To make this compatible, I added the following check:

delay_copy_with_assign = utils.MMAP_TORCH_FILES or not utils.DISABLE_MMAP

strint avatar Dec 15 '25 09:12 strint

How does this interact with pinned memory? My understanding is mmaped memory cannot be inplace pinned using the apporach currently taken in pin_memory (cuda_host_register). Does this just error and fallback to no pinning?

At the diffusion model loading node, it appears that pinned memory is not used. The function load_torch_file loads the model file from disk into a tensor dictionary. Then load_model_weights copies that tensor dictionary into the model’s state dict. Finally, the KSampler node transfers the model state dict to GPU VRAM through load_models_gpu. Based on this flow, it seems that the model parameters never reside in pinned memory.

The comfy model loader does in an place pinning of the loaded weights here:

https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/model_management.py#L1154

I don't think this works on an mmap. I remember trying it briefly (would be the holy grail of performance to single copy to pin from disk if it did work!)

rattus128 avatar Dec 19 '25 02:12 rattus128