ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Flux.2 Dev - Using LoRAs more than doubles inference time

Open alex-mitov opened this issue 3 weeks ago • 2 comments

Custom Node Testing

Expected Behavior

Inference time stays the same as not using LoRAs.

Actual Behavior

Inference time more than doubles when using LoRAs.

Steps to Reproduce

Hello,

I've noticed that using a LoRA with Flux.2 Dev more than doubles inference time. This doesn't happen with Flux.1 Dev, and even if many LoRAs are used with that model inference time stays the same as not using any LoRAs.

I'm running the latest version of ComfyUI portable - 0.3.76, and frontend version 1.34.3. Please check see screenshot below to see inference times. I've also attached the workflow and the log.

Image

flux2-dev_LoRAs_double_inference_time.json

Debug Logs

K:\ComfyUI_windows_portable_flux2>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --disable-all-custom-nodes
Adding extra search path checkpoints F:\AI Models\Checkpoints
Adding extra search path text_encoders F:\AI Models\text_encoders
Adding extra search path text_encoders F:\AI Models\CLIP
Adding extra search path clip_vision F:\AI Models\CLIP Vision
Adding extra search path configs K:\ComfyUI_windows_portable_nightly_pytorch\ComfyUI\models\configs
Adding extra search path controlnet F:\AI Models\Controlnets
Adding extra search path diffusion_models F:\AI Models\diffusion_models
Adding extra search path diffusion_models F:\AI Models\UNet
Adding extra search path diffusers F:\AI Models\Diffusers
Adding extra search path embeddings F:\AI Models\Embeddings
Adding extra search path loras F:\AI Models\LoRas
Adding extra search path upscale_models F:\AI Models\Upscalers
Adding extra search path vae F:\AI Models\VAE
Adding extra search path ipadapter F:\AI Models\IPAdapter
Adding extra search path SEEDVR2 F:\AI Models\SEEDVR2
Checkpoint files will always be loaded safely.
K:\ComfyUI_windows_portable_flux2\python_embeded\Lib\site-packages\torch\cuda\__init__.py:283: UserWarning:
    Found GPU1 NVIDIA GeForce GTX 1080 Ti which is of cuda capability 6.1.
    Minimum and Maximum cuda capability supported by this version of PyTorch is
    (7.5) - (12.0)

  warnings.warn(
K:\ComfyUI_windows_portable_flux2\python_embeded\Lib\site-packages\torch\cuda\__init__.py:304: UserWarning:
    Please install PyTorch with a following CUDA
    configurations:  12.6 following instructions at
    https://pytorch.org/get-started/locally/

  warnings.warn(matched_cuda_warn.format(matched_arches))
K:\ComfyUI_windows_portable_flux2\python_embeded\Lib\site-packages\torch\cuda\__init__.py:326: UserWarning:
NVIDIA GeForce GTX 1080 Ti with CUDA capability sm_61 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_75 sm_80 sm_86 sm_90 sm_100 sm_120.
If you want to use the NVIDIA GeForce GTX 1080 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(
Total VRAM 32607 MB, total RAM 95896 MB
pytorch version: 2.9.1+cu130
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 5090 : cudaMallocAsync
Using async weight offloading with 2 streams
Enabled pinned memory 43153.0
working around nvidia conv3d memory bug.
Using pytorch attention
Python version: 3.13.9 (tags/v3.13.9:8183fa5, Oct 14 2025, 14:09:13) [MSC v.1944 64 bit (AMD64)]
ComfyUI version: 0.3.76
ComfyUI frontend version: 1.34.3
[Prompt Server] web root: K:\ComfyUI_windows_portable_flux2\python_embeded\Lib\site-packages\comfyui_frontend_package\static
Total VRAM 32607 MB, total RAM 95896 MB
pytorch version: 2.9.1+cu130
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 5090 : cudaMallocAsync
Using async weight offloading with 2 streams
Enabled pinned memory 43153.0
Skipping loading of custom nodes
Context impl SQLiteImpl.
Will assume non-transactional DDL.
No target revision found.
Starting server

To see the GUI go to: http://127.0.0.1:8188
Exception in callback _ProactorBasePipeTransport._call_connection_lost()
handle: <Handle _ProactorBasePipeTransport._call_connection_lost()>
Traceback (most recent call last):
  File "asyncio\events.py", line 89, in _run
  File "asyncio\proactor_events.py", line 165, in _call_connection_lost
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
Using MixedPrecisionOps for text encoder: 210 quantized layers
Requested to load Flux2TEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 1280.59 MB loaded, full load: True
CLIP/text encoder model load device: cpu, offload device: cpu, current: cpu, dtype: torch.float16
Found quantization metadata (version 1.0)
Detected mixed precision quantization: 128 layers quantized
Using mixed precision operations: 128 quantized layers
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Requested to load Flux2
loaded partially; 27731.69 MB usable, 15057.02 MB loaded, 18756.00 MB offloaded, 12636.00 MB buffer reserved, lowvram patches: 87
100%|██████████████████████████████████████████████████████████████████████████████████| 25/25 [00:56<00:00,  2.24s/it]
Requested to load AutoencoderKL
loaded completely; 10596.61 MB usable, 160.31 MB loaded, full load: True
Prompt executed in 100.18 seconds
got prompt
loaded partially; 27612.32 MB usable, 27180.02 MB loaded, 6633.00 MB offloaded, 432.00 MB buffer reserved, lowvram patches: 0
100%|██████████████████████████████████████████████████████████████████████████████████| 25/25 [00:26<00:00,  1.06s/it]
Requested to load AutoencoderKL
Unloaded partially: 2160.02 MB freed, 25020.00 MB remains loaded, 432.00 MB buffer reserved, lowvram patches: 0
loaded completely; 281.16 MB usable, 160.31 MB loaded, full load: True
Prompt executed in 30.91 seconds
got prompt
Requested to load Flux2
loaded partially; 27249.85 MB usable, 14589.02 MB loaded, 19224.00 MB offloaded, 12636.00 MB buffer reserved, lowvram patches: 89
100%|██████████████████████████████████████████████████████████████████████████████████| 25/25 [00:57<00:00,  2.32s/it]
Requested to load AutoencoderKL
loaded completely; 11064.61 MB usable, 160.31 MB loaded, full load: True
Prompt executed in 73.90 seconds

Other

No response

alex-mitov avatar Dec 03 '25 00:12 alex-mitov

I think I see this problem. I can reproduce unreasonably large numbers but not the 12GB reservations just yet. Your Lora isn't googling for me (yet), if you could link me the actual lora I can confirm with greater confidence.

I have a strong lead on what you issue is.

rattus128 avatar Dec 03 '25 02:12 rattus128

found the lora

rattus128 avatar Dec 03 '25 02:12 rattus128