ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

0.3.76 Lora causes huge slowdown and vram allocation low

Open NulliferBones opened this issue 4 weeks ago • 11 comments

Custom Node Testing

Your question

In 0.3.76 using a Lora causes huge slowdown

vram allocation is also low in general (large mb buffer resevered)

Issues are especially prevalent with z-image-turbo

Logs


Other

No response

NulliferBones avatar Dec 02 '25 17:12 NulliferBones

Are you using the bf16 weights?

comfyanonymous avatar Dec 02 '25 18:12 comfyanonymous

Are you using the bf16 weights?

Using the BF16 and normal fp8 weights almost negates the speed loss, thanks. (It's kj scaled causing the huge speed loss)

However when any lora are enabled with any model vram allocation is off, it adds a huge MB buffer reserved instead of actually using the vram, which also causes overflow into low vram patches.

Also with this update if I try to use something like technically color z it just ruins the image quality.

Image with and without

Image Image

NulliferBones avatar Dec 02 '25 18:12 NulliferBones

How big is your reserved buffer and what is you GPU. Can you post the load stats line?

rattus128 avatar Dec 02 '25 21:12 rattus128

The update made loras apply correctly on z image, before that most of the loras weights were getting skipped so the loras were much weaker than they should be.

comfyanonymous avatar Dec 02 '25 22:12 comfyanonymous

How big is your reserved buffer and what is you GPU. Can you post the load stats line?

Buffer is 1.6gb with a lora added, making me only use about 3.6gb of vram total. Without lora buffer is roughly 100mb making me use about 5gb vram total.

GPU is Pascal. Before 0.3.76/75 this issue didn't exist.

NulliferBones avatar Dec 02 '25 23:12 NulliferBones

The update made loras apply correctly on z image, before that most of the loras weights were getting skipped so the loras were much weaker than they should be.

Yeah but now it's just completely destroying the image (like the image i posted) for me if I don't use a resolution like 1088x1440

NulliferBones avatar Dec 02 '25 23:12 NulliferBones

@comfyanonymous The VRAM allocation issue also occurs with Flux2 LORAs. When I use a Lora, the reserved buffer is huge with a lot of free VRAM, and it's causing significant speed degradation.

With Lora (RTX 4090 19.92s/it, 12636.00 MB buffer reserved, lowvram patches: 152):

Image

Without lora (12.20s/it, 972.00 MB buffer reserved, lowvram patches: 0):

Image

I'm using core nodes to load the weights and lora:

Image

To ensure that the issue was not related to my installation, I created a new virtual environment and reinstalled all dependencies before testing.

VandersonQk avatar Dec 03 '25 00:12 VandersonQk

Use weight_dtype default.

comfyanonymous avatar Dec 03 '25 00:12 comfyanonymous

Same result:

Image

VandersonQk avatar Dec 03 '25 00:12 VandersonQk

ok @VandersonQk I think I see your problem, and Im working on it. Im going to track this one over here which im pretty sure is the same report:

https://github.com/comfyanonymous/ComfyUI/issues/11058

rattus128 avatar Dec 03 '25 02:12 rattus128

The vram allocation bug is happening on all models for me currently (when using a lora)

NulliferBones avatar Dec 03 '25 02:12 NulliferBones