0.3.76 Lora causes huge slowdown and vram allocation low
Custom Node Testing
- [x] I have tried disabling custom nodes and the issue persists (see how to disable custom nodes if you need help)
Your question
In 0.3.76 using a Lora causes huge slowdown
vram allocation is also low in general (large mb buffer resevered)
Issues are especially prevalent with z-image-turbo
Logs
Other
No response
Are you using the bf16 weights?
Are you using the bf16 weights?
Using the BF16 and normal fp8 weights almost negates the speed loss, thanks. (It's kj scaled causing the huge speed loss)
However when any lora are enabled with any model vram allocation is off, it adds a huge MB buffer reserved instead of actually using the vram, which also causes overflow into low vram patches.
Also with this update if I try to use something like technically color z it just ruins the image quality.
Image with and without
How big is your reserved buffer and what is you GPU. Can you post the load stats line?
The update made loras apply correctly on z image, before that most of the loras weights were getting skipped so the loras were much weaker than they should be.
How big is your reserved buffer and what is you GPU. Can you post the load stats line?
Buffer is 1.6gb with a lora added, making me only use about 3.6gb of vram total. Without lora buffer is roughly 100mb making me use about 5gb vram total.
GPU is Pascal. Before 0.3.76/75 this issue didn't exist.
The update made loras apply correctly on z image, before that most of the loras weights were getting skipped so the loras were much weaker than they should be.
Yeah but now it's just completely destroying the image (like the image i posted) for me if I don't use a resolution like 1088x1440
@comfyanonymous The VRAM allocation issue also occurs with Flux2 LORAs. When I use a Lora, the reserved buffer is huge with a lot of free VRAM, and it's causing significant speed degradation.
With Lora (RTX 4090 19.92s/it, 12636.00 MB buffer reserved, lowvram patches: 152):
Without lora (12.20s/it, 972.00 MB buffer reserved, lowvram patches: 0):
I'm using core nodes to load the weights and lora:
To ensure that the issue was not related to my installation, I created a new virtual environment and reinstalled all dependencies before testing.
Use weight_dtype default.
Same result:
ok @VandersonQk I think I see your problem, and Im working on it. Im going to track this one over here which im pretty sure is the same report:
https://github.com/comfyanonymous/ComfyUI/issues/11058
The vram allocation bug is happening on all models for me currently (when using a lora)