ComfyUI Performance issue starting at commit 08f92d5 adding 10 sec delay to start KSampler

Expected Behavior

Regenerating images using the same prompt should take roughly the same time everytime.

Actual Behavior

Since a recent update (this commit to be more precise), regenerating the same image takes an extra 10 seconds just for the KSampler to get started.

Notice how the it/s is pretty much the same, yet the prompt takes 10s longer to execute. Whenever this happens, there's an extra line in the logs saying loaded completely, right above the progress percentage.

Steps to Reproduce

I executed git checkout ad76574cb8b28ee498f3dceafc9d00b56f12f992 (latest master commit at the time of writing)
I launched ComfyUI with all custom nodes disabled (--disable-all-custom-nodes)
I loaded this workflow, since it loads an SDXL model along with 2 LoRAs. For the record, this was tested on a laptop with RTX 3080 8GB VRAM and 32GB of RAM.
I generated several images with the same workflow, and most of the time there's a 10 seconds delay before the KSampler is executed (see logs in screenshot attached to Actual Behavior).
I executed git checkout 8115d8cce97a3edaaad8b08b45ab37c6782e1cb4 (commit 8115d8c), then ran the workflow several times - no problem at all, could not reproduce the issue.
I executed git checkout 08f92d55e934c19f753b47ec4c51760c68bbe2b7 to pull the next commit from history, then ran the workflow - problem started here.

Debug Logs

got prompt
Requested to load SDXLClipModel
Loading 1 new model
Requested to load SDXL
Loading 1 new model
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:17<00:00,  1.67it/s]
Requested to load AutoencoderKL
Loading 1 new model
Prompt executed in 24.83 seconds
got prompt
loaded completely 5262.788785171509 4897.0483474731445
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:18<00:00,  1.60it/s]
Prompt executed in 34.30 seconds

Other

No response

Aug 12 '24 06:08 fappaz

I think this is because it's loading the actual transformer/unet when it reaches the ksampler. When it first looks like it's loading the checkpoint, it's probably just loading the t5 or clips for the prompt, depending on what model you're working with. Once it reaches the ksampler, if you have your task manager open, you'll see that your disk usage will indicate it's loading something.

I noticed this messing around with flux. Probably makes memory management much easier for bigger models, like ones that use t5 which takes up a lot of memory. That way, it can load the text encoders, do the encoding and then offload the models or purge them to make room for the model. I think there was also a PR that allows encoders to load directly to the vram now too, saving a potential copy step as well.

Aug 12 '24 12:08 RandomGitUser321

Should be fixed now.

Aug 12 '24 16:08 comfyanonymous

I can confirm that this has been fixed in the latest commits, as I can no longer reproduce the issue.

got prompt
loaded completely 5201.311734390259 4897.0483474731445
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:16<00:00,  1.77it/s]
Prompt executed in 20.28 seconds
got prompt
loaded completely 5201.273587417602 4897.0483474731445
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:17<00:00,  1.76it/s]
Prompt executed in 20.22 seconds
got prompt
loaded completely 5229.418118667602 4897.0483474731445
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:17<00:00,  1.75it/s]
Prompt executed in 20.32 seconds

With a bonus that the total execution time is now 3-4 seconds faster. Thanks for all your incredible work.

Aug 12 '24 20:08 fappaz