ComfyUI Flux LoRA getting loaded only partially on a low VRAM GPU after commit 08f92d55e934c19f753b47ec4c51760c68bbe2b7

Expected Behavior

LoRA should get applied fully, e.g. image should look like this: fluxs_1723188953_0001_before

Actual Behavior

LoRA is getting loaded only partially and producing an image with diminished effect: fluxs_1723189106_0001_after_08f92d55e934c19f753b47ec4c51760c68bbe2b7

Here's an image with LoRA strength set to zero for comparison: fluxs_1723189152_0001_strength_0

Steps to Reproduce

I'm using a potato GPU with only 8 GB of VRAM. I'm using my own LoRA from here: https://huggingface.co/mikaelh/flux-sanna-marin-lora-v0.2-fp8 Any other Flux LoRA will presumably also have the same issue.

Workflows are attached in the images above.

Debug Logs

got prompt
model weight dtype torch.float8_e4m3fn, manual cast: torch.float16
model_type FLOW
Model doesn't have a device attribute.
clip missing: ['text_projection.weight']
Requested to load FluxClipModel_
Loading 1 new model
Requested to load Flux
Loading 1 new model
loaded in lowvram mode 6233.7625
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:42<00:00, 10.60s/it]
Using pytorch attention in VAE
Using pytorch attention in VAE
Model doesn't have a device attribute.
Requested to load AutoencodingEngine
Loading 1 new model
Prompt executed in 68.62 seconds

Other

With some manual bisecting I traced the issue back to commit 08f92d55e934c19f753b47ec4c51760c68bbe2b7. Before this commit everything works fine and afterwards I'm getting the diminished effect. The issue still persists even with the latest version.

Aug 09 '24 08:08 mhirki

I wonder if this is what I'm seeing. Flux Lora's were working fine, but now they seem to have almost no effect, with no error / warnings output.

Aug 13 '24 06:08 markrmiller

loras still seem to work fine, they might just be weaker when weights are in fp8 so you might have to bump up the strength a bit.

Aug 13 '24 06:08 comfyanonymous

I tested the latest changes on my 8 GB card and the issue still persists. I did briefly test the LoRA on a 24 GB card couple of days ago and it was working fine.

Aug 13 '24 07:08 mhirki

Can you try with 16bit weights to see if it works?

Aug 13 '24 08:08 comfyanonymous

Results with weight_dtype = default are even worse. Now the LoRA is having almost no effect. fluxs_00013_default_precision

I did also test commit 39fb74c5bd13a1dccf4d7293a2f7a755d9f43cbd from a few minutes ago and it made no difference.

Aug 13 '24 08:08 mhirki

Just adding this also seems to be the case for me. Loras applied in ComfyUI don't match how the lora is expected to perform. Seems very weak. Same lora applied to the same FP8 model in Forge performs as expected. No missing keys reported in the console. Same seeds are "similar" but Forge's output matches the expectation for what the lora should be accomplishing. (It's an artstyle lora, the likeness to the style is next to nothing in ComfyUI.) Seems like a strength of 1 in ComfyUI is more like 0.25 when compared to Forge's strength of 1.

Aug 16 '24 19:08 rabidcopy

Commit 83f343146ae1e8ccaf21da5b012bf59c78b97179 fixes this issue.

Aug 16 '24 22:08 mhirki

Wanted to add 83f3431 also resolved the issue I had. Not sure if https://github.com/comfyanonymous/ComfyUI/commit/bb222ceddb232aafafa99cd4dec38b3719c29d7d was still necessary. (haven't tried it)

Aug 17 '24 23:08 rabidcopy

i can't rerun queue prompt,,, first queue its normal, but when i queque again for a second time,,, output same result or burn... i tried to configure my step and cfg... and nothing happend... just burn image...

Aug 21 '24 05:08 listmaster21

i can't rerun queue prompt,,, first queue its normal, but when i queque again for a second time,,, output same result or burn... i tried to configure my step and cfg... and nothing happend... just burn image...

Is this phenomenon occurring in a workflow constructed using only core nodes?

Aug 21 '24 11:08 ltdrdata