ComfyUI Very High VRAM usage when using lora with flux

Expected Behavior

Not 10Gb Vram eaten using the lora.

Actual Behavior

I have flux fp8 schnell on a 3090, I run two loras rank 64 onto the model, but it uses all VRAM until it starts offloading and generations of course slow down.

Steps to Reproduce

Just add two 64 rank lora onto flux schnell fp8

Debug Logs

None

Other

No response

Aug 29 '24 20:08 axel578

Try updating: update/update_comfyui.bat if you are on the standalone.

Aug 29 '24 23:08 comfyanonymous

Try updating: update/update_comfyui.bat if you are on the standalone.

I updated it just now an it still uses 10 Gb for a a single lora rank 64:

(Yes its really the lora on the screenshot taking the 10 Gig (12 already occupied by schnell model)

Aug 30 '24 09:08 axel578

I updated it just now an it still uses 10 Gb for a a single lora rank 64: ![image](https://private-user-

It likely has to make a complete patched copy of the flux model and probably does it in whatever precision the lora is in or maybe it can only do it in fp16 or higher. So even though you're using flux fp8, it might have to upcast it to f16/bf16 to apply the lora patch. I don't know if it can then downcast the patched model precision back to fp8, but in theory it should be able to after it's done.

Aug 30 '24 10:08 RandomGitUser321

Same issue here, before latest update of ComfyUI, all working fine with exactly same workflow than before the update (two small lora loaded).

Current version: (ComfyUI: 2630ec28cd) GPU RTX 4060 Ti Total VRAM 16379 MB, total RAM 65463 MB pytorch version: 2.4.0+cu121

Launch args: --normalvram --fast --use-pytorch-cross-attention

Flux.1 Dev / fp8_e4m3fn

Error occurred when executing KSampler:

Allocation on device

File "G:\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 317, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 192, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 169, in _map_node_over_list
process_inputs(input_dict, i)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 158, in process_inputs
results.append(getattr(obj, func)(**inputs))
File "G:\StabilityMatrix\Data\Packages\ComfyUI\nodes.py", line 1429, in sample
return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\nodes.py", line 1396, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
File "G:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\sample_error_enhancer.py", line 22, in informative_sample
raise e
File "G:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\sample_error_enhancer.py", line 9, in informative_sample
return original_sample(*args, **kwargs) # This code helps interpret error messages that occur within exceptions but does not have any impact on other operations.
File "G:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\ComfyUI-AnimateDiff-Evolved\animatediff\sampling.py", line 420, in motion_sample
return orig_comfy_sample(model, noise, *args, **kwargs)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Advanced-ControlNet\adv_control\sampling.py", line 116, in acn_sample
return orig_comfy_sample(model, *args, **kwargs)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Advanced-ControlNet\adv_control\utils.py", line 116, in uncond_multiplier_check_cn_sample
return orig_comfy_sample(model, *args, **kwargs)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\sample.py", line 43, in sample
samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\samplers.py", line 829, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\samplers.py", line 729, in sample
return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\samplers.py", line 706, in sample
self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\sampler_helpers.py", line 66, in prepare_sampling
comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required, minimum_memory_required=minimum_memory_required)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 542, in load_models_gpu
cur_loaded_model = loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 326, in model_load
raise e
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 322, in model_load
self.real_model = self.model.patch_model(device_to=patch_model_to, lowvram_model_memory=lowvram_model_memory, load_weights=load_weights, force_patch_weights=force_patch_weights)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_patcher.py", line 427, in patch_model
self.load(device_to, lowvram_model_memory=lowvram_model_memory, force_patch_weights=force_patch_weights, full_load=full_load)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_patcher.py", line 393, in load
self.patch_weight_to_device(weight_key, device_to=device_to)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_patcher.py", line 324, in patch_weight_to_device
out_weight = comfy.float.stochastic_rounding(out_weight, weight.dtype, seed=string_to_seed(key))
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\float.py", line 60, in stochastic_rounding
return manual_stochastic_round_to_float8(value, dtype, generator=generator)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\float.py", line 37, in manual_stochastic_round_to_float8
abs_x[:] = calc_mantissa(abs_x, exponent, normal_mask, MANTISSA_BITS, EXPONENT_BIAS, generator=generator)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\comfy\float.py", line 7, in calc_mantissa
(abs_x / (2.0 ** (exponent - EXPONENT_BIAS)) - 1.0) * (2**MANTISSA_BITS),
File "G:\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\_tensor.py", line 41, in wrapped
return f(*args, **kwargs)
File "G:\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\_tensor.py", line 991, in __rpow__
return torch.pow(other, self)

torch.OutOfMemoryError: Allocation on device 

Got an OOM, unloading all loaded models.

Aug 30 '24 13:08 buzzjeux

I got the same issue, loading 2 loras gives OOM. Tried experimenting with --disable-smart-memory --reserve-vram but doesn't seem to help.

Reverting to 9230f658232fd94d0beeddb94aed093a1eca82b5 helps.

@comfyanonymous this commit breaks stuff (7985ff88b9a7099378b5f2026bee5da63d3fc53f) after it I started getting OOMs.

Aug 30 '24 16:08 dan4ik94

I got the same issue, loading 2 loras gives OOM. Tried experimenting with --disable-smart-memory --reserve-vram but doesn't seem to help.

I have only 8GB VRAM. --disable-smart-memory didn't work for me. --reserve-vram worked, but we need to keep increasing the value, until it works for a specific task:

--reserve-vram 1.2 - Flux FP8 + 1 lora
--reserve-vram 2.0 - Flux FP8 + 2 loras ( maybe --reserve-vram 1.6 could also work, I didn't test )
--reserve-vram 2.0 - Flux Q4_K_S + InstantX Canny
--reserve-vram 2.4 - Flux Q4_K_S + InstantX Canny + 2 loras

Aug 30 '24 18:08 JorgeR81

Well I can in theory get rid off the OOMs by falling back to system memory, but that's 10x times slower. I want the same performance as in c6812947e98eb384250575d94108d9eb747765d9

Aug 30 '24 22:08 dan4ik94

Its still problematic on last version @comfyanonymous

Aug 31 '24 23:08 axel578

same problem here. use fooocus ai instead until they fix this annoyong error.

Sep 01 '24 00:09 Archviz360

Bug are fix for me with the latest update Lower fp8 lora memory usage. Thanks!

Sep 03 '24 19:09 buzzjeux

@comfyanonymous @buzzjeux ,

Despite using a small LoRA, there's an incredible VRAM increase. I tested it in Diffusers and this issue doesn't occur there.

ComfyUI: Txt2Img(Flux) + LoRA: 44GB Diffusers: Txt2Img(Flux) + LoRA: 38GB

Dec 04 '24 08:12 kadirnar

It's only an issue if you actually get an OOM or slowdowns. ComfyUI might store extra things in GPU memory if you have some free to speed up inference vs diffusers.

Dec 05 '24 00:12 comfyanonymous

any solution?

Jun 23 '25 08:06 Amit30swgoh

any solution?

You can apply the following options in the CLI:

You can adjust the value of --reserve-vram to forcibly reserve some free memory.
Applying --disable-smart-memory along with --normalvram will disable the features that try to maximize VRAM usage.

Jun 23 '25 10:06 ltdrdata