ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

free_memory is not work

Open huangqiaobo opened this issue 1 year ago • 12 comments

Expected Behavior

free_memory works fine

Actual Behavior

something not release

Steps to Reproduce

my version is commit 38c22e6. This problem also exists in previous versions like v0.0.8.

  1. edit prompt_worker function and add log to display rss.
  2. set free_memory to True
  3. start comfyui main.py --cpu --disable-all-custom-nodes
  4. run a simple workflow with CheckpointLoaderSimple workflow.json
def prompt_worker(q, server):
    e = execution.PromptExecutor(server, lru_size=args.cache_lru)
    last_gc_collect = 0
    need_gc = False
    gc_collect_interval = 10.0
    import psutil

    process = psutil.Process()
    while True:
        timeout = 1000.0
        if need_gc:
            timeout = max(gc_collect_interval - (current_time - last_gc_collect), 0.0)

        queue_item = q.get(timeout=timeout)
        if queue_item is not None:
            item, item_id = queue_item
            execution_start_time = time.perf_counter()
            prompt_id = item[1]
            server.last_prompt_id = prompt_id
            logging.info(
                f"before e.execute {round(process.memory_info().rss / (1024**2), 2)}MB"
            )

            e.execute(item[2], prompt_id, item[3], item[4])
            logging.info(
                f"after e.execute {round(process.memory_info().rss / (1024**2), 2)}MB"
            )
            need_gc = True
            q.task_done(
                item_id,
                e.history_result,
                status=execution.PromptQueue.ExecutionStatus(
                    status_str="success" if e.success else "error",
                    completed=e.success,
                    messages=e.status_messages,
                ),
            )
            if server.client_id is not None:
                server.send_sync(
                    "executing",
                    {"node": None, "prompt_id": prompt_id},
                    server.client_id,
                )

            current_time = time.perf_counter()
            execution_time = current_time - execution_start_time
            logging.info("Prompt executed in {:.2f} seconds".format(execution_time))

        flags = q.get_flags()
        free_memory = True  # flags.get("free_memory", False)

        if flags.get("unload_models", free_memory):
            comfy.model_management.unload_all_models()
            need_gc = True
            last_gc_collect = 0

        if free_memory:
            e.reset()
            need_gc = True
            last_gc_collect = 0
            logging.info(f"run e.reset()")

        if need_gc:
            current_time = time.perf_counter()
            if (current_time - last_gc_collect) > gc_collect_interval:
                comfy.model_management.cleanup_models()
                gc.collect()
                comfy.model_management.soft_empty_cache()
                last_gc_collect = current_time
                need_gc = False
                logging.info(
                    f"after need_gc {round(process.memory_info().rss / (1024**2), 2)}MB"
                )

Debug Logs

got prompt
before e.execute 350.97MB
model weight dtype torch.float32, manual cast: None
model_type EPS
Using split attention in VAE
Using split attention in VAE
Requested to load SD1ClipModel
Loading 1 new model
loaded completely 0.0 235.84423828125 True
Requested to load BaseModel
Loading 1 new model
loaded completely 0.0 3278.812271118164 True
100%|███████████████████████████████████| 1/1 [00:04<00:00,  4.08s/it]
Requested to load AutoencoderKL
Loading 1 new model
loaded completely 0.0 319.11416244506836 True
after e.execute 5324.84MB
Prompt executed in 17.68 seconds
run e.reset()
after need_gc 3316.52MB

Other

  1. before e.execute 350.97MB
  2. after e.execute 5324.84MB
  3. run e.reset()
  4. after need_gc 3316.52MB

nearly 3G is not released in memory. I'm using a 2G checkpoint, if I use a bigger one it will be bigger what are these and how do I release them?

huangqiaobo avatar Aug 28 '24 11:08 huangqiaobo

FYI, that feature doesn't release cached data. You have to execute empty workflow before free_memory if you want to wipe all.

ltdrdata avatar Aug 28 '24 12:08 ltdrdata

FYI, that feature doesn't release cached data. You have to execute empty workflow before free_memory if you want to wipe all.

image

I then ran this workflow and the memory was still not released

huangqiaobo avatar Aug 28 '24 12:08 huangqiaobo

FYI, эта функция не освобождает кэшированные данные. Вам нужно выполнить пустой рабочий процесс перед free_memory, если вы хотите стереть все.

изображение Затем я запустил этот рабочий процесс, но память все еще не была освобождена.

Have you updated ComfyUI to today's fix? https://github.com/comfyanonymous/ComfyUI/commit/38c22e631ad090a4841e4a0f015a30c565a9f7fc

Shyryp avatar Aug 28 '24 18:08 Shyryp

FYI, эта функция не освобождает кэшированные данные. Вам нужно выполнить пустой рабочий процесс перед free_memory, если вы хотите стереть все.

изображение Затем я запустил этот рабочий процесс, но память все еще не была освобождена.

Have you updated ComfyUI to today's fix? 38c22e6

yes, 38c22e6

huangqiaobo avatar Aug 29 '24 03:08 huangqiaobo

i find the memory leak is happen in torch function module._load_from_state_dict

huangqiaobo avatar Aug 29 '24 09:08 huangqiaobo

i find the memory leak is happen in torch function module._load_from_state_dict

Do you use torch 2.4.0 on windows?

ltdrdata avatar Aug 29 '24 10:08 ltdrdata

i find the memory leak is happen in torch function module._load_from_state_dict

Do you use torch 2.4.0 on windows?

i have tried these versions 2.3.0 on mac. 2.4.0 and 2.5.0.dev20240827+cu124 on linux. They all have this problem.

i have fined the memory increase is happen at this line param.copy_(input_param)

the param and input_param can release by gc but the memory not decrease after gc

huangqiaobo avatar Aug 29 '24 11:08 huangqiaobo

image

Seems like the model disposage is the issue here: (This does clear the VRAM but duplicates the cache in ram)

    model_management.unload_all_models()
    model_management.soft_empty_cache(True)
    gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.ipc_collect()
    gc.collect()

Why? Because the gc normally would detect and clear the leftovers, but apparently it doesn't, which means it is code side(aka something is not disposed properly, hence gc overlooking it.)

Since this is compatible with linux, i got a question: Does comfyUI or torch try to manage the cache itself?

TheFrieber avatar Sep 14 '24 20:09 TheFrieber

image

Seems like the model disposage is the issue here: (This does clear the VRAM but duplicates the cache in ram)

    model_management.unload_all_models()
    model_management.soft_empty_cache(True)
    gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.ipc_collect()
    gc.collect()

Why? Because the gc normally would detect and clear the leftovers, but apparently it doesn't, which means it is code side(aka something is not disposed properly, hence gc overlooking it.)

Since this is compatible with linux, i got a question: Does comfyUI or torch try to manage the cache itself?

torch has a memory pool.

I want to know if there is any way to release this memory pool.

huangqiaobo avatar Sep 19 '24 06:09 huangqiaobo

Has anyone solved this problem?

denred0 avatar Nov 01 '24 10:11 denred0

I guess this unreleased memory is pinned memory, it's managed by CUDACachingHostAllocator.

If we want to clean up this memory, we need to call the CachingHostAllocator_emptyCache method.

This method can be used in versions after 2.5.1 torch._C._host_emptyCache()

pinmem about pinned memory https://github.com/pytorch/pytorch/issues/134332

huangqiaobo avatar Nov 14 '24 11:11 huangqiaobo

sad.

torch._C._host_emptyCache() method is not work in this case.

Because CheckpointLoaderSimple may not using pin_memory.

However, some third-party nodes use pin_memory=True when using torch.utils.data.DataLoader.

If you use such nodes, you may need to call torch._C._host_emptyCache() to release memory.

huangqiaobo avatar Nov 15 '24 09:11 huangqiaobo

@huangqiaobo , how did you confirm your idea that

i have fined the memory increase is happen at this line
[param.copy_(input_param)](https://github.com/pytorch/pytorch/blob/d01a7a9faa5a742a3df7374b97bbc1db1205b6ed/torch/nn/modules/module.py#L2425)

, with a memory leak detector tool ? By the way, it means that it has no good idea to solve this issue except restart ? Thanks!

t00350320 avatar Jan 14 '25 02:01 t00350320

@huangqiaobo , how did you confirm your idea that

i have fined the memory increase is happen at this line
[param.copy_(input_param)](https://github.com/pytorch/pytorch/blob/d01a7a9faa5a742a3df7374b97bbc1db1205b6ed/torch/nn/modules/module.py#L2425)

, with a memory leak detector tool ? By the way, it means that it has no good idea to solve this issue except restart ? Thanks!

At the beginning,i found that, after param.copy_(input_param) is executed, the memory increases, but the memory does not decrease after clear cache andfree_memory, so I think a memory leak occurs. In fact, this is wrong.

The memory leak is happen in pytorch. The newest version of pytorch has better memory performance.

torch.__version__: 2.4.0+cu121
load model_2GB.safetensors:rss=578.3MB
after load_file:rss=586.04MB
end:rss=3543.02MB
after collect:rss=1902.7MB

torch.__version__: 2.6.0.dev20241115+cu124
load model_2GB.safetensors:rss=601.0MB
after load_file:rss=607.55MB
end:rss=2660.03MB
after collect:rss=1019.71MB

The logs was generated a few months ago

hear is my test code

import sys

sys.path.append("./comfyui")

import safetensors.torch
import psutil
import gc
from comfy.cli_args import args
import torch

# args.highvram = True

import comfy.sd

process = psutil.Process()
model_path_1 = "./model_2GB.safetensors"


def print_rss(l=""):
    rss: float = round(process.memory_info().rss / (1024**2), 2)
    print(f"{l}:rss={rss}MB")

def load(model_path):
    print_rss(f"load {model_path}")
    sd = safetensors.torch.load_file(model_path, device="cpu")
    print_rss("after load_file")
    comfy.sd.load_state_dict_guess_config(
        sd,
        output_vae=False,
        output_clip=False,
        output_clipvision=False,
        embedding_directory=None,
        output_model=True,
        model_options={},
    )
    print_rss("end")


# gc.freeze()
gc.collect()
load(model_path_1)
gc.collect()
print_rss("after collect")

The test code was also written a few months ago

After upgrading pytorch, the situation has improved and now I don’t pay attention to this issue anymore。

I hope you can resolve this issue completely.

Thanks!

huangqiaobo avatar Jan 14 '25 09:01 huangqiaobo

i updated comfyui to v0.3.59 and it is devours 24gb vram on 2sdlx models. i cant use 1 workflow anymore which i created in the past months. i bet its an pytorch problem but since i installed all dependencies especially for other nodes in comfy its fucking bricked. thank you

Edit:

System properties -> Advanced system settings ->performance -> Increase Virtual Memory on drive Manually -> 16MB to "double size of your Vram". fixed the problem for me.

Weaze1 avatar Sep 12 '25 13:09 Weaze1