SwarmUI icon indicating copy to clipboard operation
SwarmUI copied to clipboard

Flux fill doesn't work as a segment model

Open willhsmit opened this issue 6 months ago • 0 comments

Expected Behavior

I should be able to use Flux fill as the model for refining a <segment> tag, either with (i) using Flux fill as the base model for an inpaint, leaving the Segment Model unset, and using the <Segment> tag to further refine part of the inpaint with Flux fill (ii) Using Flux dev as the base model for a generation, setting the Segment Model to flux fill, and using the <Segment> tag to further refine part of the gen with Flux fill.

(Both of these use cases fail for what I think are related reasons)

Actual Behavior

In case (i) (FluxFill as both base and segment model), the segment attempts to redraw a miniature version of the entire input image in the segment area.

You can see in the following image that the entire forest - not just the segment or the masked area - is redrawn over the woman's face (prompt was: a blond woman smiling at the camera, wearing a green shirt, waist up medium shot <segment:face,0.9,0.5> closeup of a young blond woman's face, smiling)

Image

In case (ii), (FluxDev as base model and FluxFill as segment model), the gen fails altogether with:

17:10:33.972 [Warning] [ComfyUI-0/STDERR] File "C:\Users\willh\SwarmUI\dlbackend\comfy\ComfyUI\execution.py", line 349, in execute 17:10:33.973 [Warning] [ComfyUI-0/STDERR] output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) 17:10:33.973 [Warning] [ComfyUI-0/STDERR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 17:10:33.974 [Warning] [ComfyUI-0/STDERR] File "C:\Users\willh\SwarmUI\dlbackend\comfy\ComfyUI\execution.py", line 224, in get_output_data 17:10:33.975 [Warning] [ComfyUI-0/STDERR] return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) 17:10:33.975 [Warning] [ComfyUI-0/STDERR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 17:10:33.976 [Warning] [ComfyUI-0/STDERR] File "C:\Users\willh\SwarmUI\dlbackend\comfy\ComfyUI\execution.py", line 196, in _map_node_over_list 17:10:33.976 [Warning] [ComfyUI-0/STDERR] process_inputs(input_dict, i) 17:10:33.977 [Warning] [ComfyUI-0/STDERR] File "C:\Users\willh\SwarmUI\dlbackend\comfy\ComfyUI\execution.py", line 185, in process_inputs 17:10:33.977 [Warning] [ComfyUI-0/STDERR] results.append(getattr(obj, func)(**inputs)) 17:10:33.978 [Warning] [ComfyUI-0/STDERR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 17:10:33.979 [Warning] [ComfyUI-0/STDERR] File "C:\Users\willh\SwarmUI\dlbackend\comfy\ComfyUI\nodes.py", line 417, in encode 17:10:33.979 [Warning] [ComfyUI-0/STDERR] x = (pixels.shape[1] // 8) * 8 17:10:33.980 [Warning] [ComfyUI-0/STDERR] ^^^^^^^^^^^^ 17:10:33.980 [Warning] [ComfyUI-0/STDERR] AttributeError: 'NoneType' object has no attribute 'shape' 17:10:33.981 [Warning] [ComfyUI-0/STDERR]

Steps to Reproduce

  1. Set Flux fill as the model, Flux guidance scale = 30
  2. Drag in an image for i2i (example below)
  3. Mask an area (mask shown)
  4. Enter a text prompt with a <segment> (example: a blond woman smiling at the camera, wearing a green shirt, waist up medium shot <segment:face,0.9,0.5> closeup of a young blond woman's face, smiling
  5. Generate.
  6. See that the segmenter has correctly identified the face region, but has pasted a redrawn version of the entire image into the face area, rather than drawing a face.

Image Image

Debug Logs

No errors for case (i); case (ii) has the following:

2025-05-28 17:10:10.534 [Info] User local requested 1 image with model 'flux-dev-nunchaku-int4/transformer_blocks.safetensors'... 2025-05-28 17:10:10.535 [Debug] [BackendHandler] Backend request #2 for model flux-dev-nunchaku-int4/transformer_blocks.safetensors, maxWait=7.00:00:00. 2025-05-28 17:10:10.535 [Debug] [BackendHandler] Backend request #2 found correct model on #0 2025-05-28 17:10:10.535 [Debug] [BackendHandler] Backend request #2 finished. 2025-05-28 17:10:10.535 [Debug] Auto-selected first available VAE of compat class 'flux-1', VAE 'ae.sft' will be applied 2025-05-28 17:10:10.535 [Debug] Auto-selected first available VAE of compat class 'flux-1', VAE 'ae.sft' will be applied 2025-05-28 17:10:12.577 [Debug] [ComfyUI-0/STDERR] got prompt 2025-05-28 17:10:12.604 [Debug] [ComfyUI-0/STDERR] 2025-05-28 17:10:14.073 [Debug] [ComfyUI-0/STDERR] 0%| | 0/30 [00:00<?, ?it/s] 2025-05-28 17:10:15.282 [Debug] [ComfyUI-0/STDERR] 3%|▎ | 1/30 [00:01<00:42, 1.47s/it] 2025-05-28 17:10:16.481 [Debug] [ComfyUI-0/STDERR] 7%|▋ | 2/30 [00:02<00:36, 1.32s/it] 2025-05-28 17:10:17.681 [Debug] [ComfyUI-0/STDERR] 10%|█ | 3/30 [00:03<00:34, 1.26s/it] 2025-05-28 17:10:18.881 [Debug] [ComfyUI-0/STDERR] 13%|█▎ | 4/30 [00:05<00:32, 1.24s/it] 2025-05-28 17:10:20.107 [Debug] [ComfyUI-0/STDERR] 17%|█▋ | 5/30 [00:06<00:30, 1.22s/it] 2025-05-28 17:10:21.337 [Debug] [ComfyUI-0/STDERR] 23%|██▎ | 7/30 [00:07<00:21, 1.09it/s] 2025-05-28 17:10:22.597 [Debug] [ComfyUI-0/STDERR] 30%|███ | 9/30 [00:08<00:16, 1.27it/s] 2025-05-28 17:10:23.854 [Debug] [ComfyUI-0/STDERR] 40%|████ | 12/30 [00:09<00:11, 1.63it/s] 2025-05-28 17:10:25.141 [Debug] [ComfyUI-0/STDERR] 50%|█████ | 15/30 [00:11<00:08, 1.87it/s] 2025-05-28 17:10:26.401 [Debug] [ComfyUI-0/STDERR] 63%|██████▎ | 19/30 [00:12<00:04, 2.26it/s] 2025-05-28 17:10:27.634 [Debug] [ComfyUI-0/STDERR] 73%|███████▎ | 22/30 [00:13<00:03, 2.30it/s] 2025-05-28 17:10:28.865 [Debug] [ComfyUI-0/STDERR] 80%|████████ | 24/30 [00:15<00:02, 2.09it/s] 2025-05-28 17:10:30.098 [Debug] [ComfyUI-0/STDERR] 87%|████████▋ | 26/30 [00:16<00:02, 1.95it/s] 2025-05-28 17:10:31.311 [Debug] [ComfyUI-0/STDERR] 93%|█████████▎| 28/30 [00:17<00:01, 1.85it/s] 2025-05-28 17:10:32.515 [Debug] [ComfyUI-0/STDERR] 97%|█████████▋| 29/30 [00:18<00:00, 1.55it/s] 2025-05-28 17:10:32.515 [Debug] [ComfyUI-0/STDERR] 100%|██████████| 30/30 [00:19<00:00, 1.34it/s] 2025-05-28 17:10:32.515 [Debug] [ComfyUI-0/STDERR] 100%|██████████| 30/30 [00:19<00:00, 1.51it/s] 2025-05-28 17:10:33.955 [Debug] [ComfyUI-0/STDERR] !!! Exception during processing !!! 'NoneType' object has no attribute 'shape' 2025-05-28 17:10:33.971 [Warning] [ComfyUI-0/STDERR] Traceback (most recent call last): 2025-05-28 17:10:33.972 [Warning] [ComfyUI-0/STDERR] File "C:\Users\willh\SwarmUI\dlbackend\comfy\ComfyUI\execution.py", line 349, in execute 2025-05-28 17:10:33.973 [Warning] [ComfyUI-0/STDERR] output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) 2025-05-28 17:10:33.973 [Warning] [ComfyUI-0/STDERR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-05-28 17:10:33.974 [Warning] [ComfyUI-0/STDERR] File "C:\Users\willh\SwarmUI\dlbackend\comfy\ComfyUI\execution.py", line 224, in get_output_data 2025-05-28 17:10:33.975 [Warning] [ComfyUI-0/STDERR] return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) 2025-05-28 17:10:33.975 [Warning] [ComfyUI-0/STDERR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-05-28 17:10:33.976 [Warning] [ComfyUI-0/STDERR] File "C:\Users\willh\SwarmUI\dlbackend\comfy\ComfyUI\execution.py", line 196, in _map_node_over_list 2025-05-28 17:10:33.976 [Warning] [ComfyUI-0/STDERR] process_inputs(input_dict, i) 2025-05-28 17:10:33.977 [Warning] [ComfyUI-0/STDERR] File "C:\Users\willh\SwarmUI\dlbackend\comfy\ComfyUI\execution.py", line 185, in process_inputs 2025-05-28 17:10:33.977 [Warning] [ComfyUI-0/STDERR] results.append(getattr(obj, func)(**inputs)) 2025-05-28 17:10:33.978 [Warning] [ComfyUI-0/STDERR] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-05-28 17:10:33.978 [Warning] [ComfyUI-0/STDERR] File "C:\Users\willh\SwarmUI\dlbackend\comfy\ComfyUI\nodes.py", line 417, in encode 2025-05-28 17:10:33.979 [Warning] [ComfyUI-0/STDERR] x = (pixels.shape[1] // 8) * 8 2025-05-28 17:10:33.980 [Warning] [ComfyUI-0/STDERR] ^^^^^^^^^^^^ 2025-05-28 17:10:33.980 [Warning] [ComfyUI-0/STDERR] AttributeError: 'NoneType' object has no attribute 'shape' 2025-05-28 17:10:33.981 [Warning] [ComfyUI-0/STDERR] 2025-05-28 17:10:33.981 [Debug] [ComfyUI-0/STDERR] Prompt executed in 21.39 seconds 2025-05-28 17:10:34.186 [Debug] Failed to process comfy workflow for inputs T2IParamInput(prompt: photograph of a blonde woman standing in a forest <segment:face,0.9,0.5//cid=11> closeup photograph of a woman's face, model: flux-dev-nunchaku-int4/transformer_blocks, seed: 1617395626, steps: 30, cfgscale: 1, aspectratio: 1:1, width: 1024, height: 1024, scheduler: simple, fluxguidancescale: 3.5, segmentmodel: flux-fill-nunchaku-int4/transformer_blocks, colorcorrectionbehavior: Linear, nunchakucachethreshold: 0.1, automaticvae: True, negativeprompt: ) with raw workflow { "4": { "class_type": "NunchakuFluxDiTLoader", "inputs": { "model_path": "flux-dev-nunchaku-int4", "cache_threshold": 0.1, "attention": "nunchaku-fp16", "cpu_offload": "auto", "device_id": 0, "data_type": "float16", "i2f_mode": "enabled" } }, "100": { "class_type": "DualCLIPLoader", "inputs": { "clip_name1": "t5xxl_enconly.safetensors", "clip_name2": "clip_l_sdxl_base.safetensors", "type": "flux" } }, "101": { "class_type": "VAELoader", "inputs": { "vae_name": "ae.sft" } }, "5": { "class_type": "EmptySD3LatentImage", "inputs": { "batch_size": 1, "height": 1024, "width": 1024 } }, "6": { "class_type": "SwarmClipTextEncodeAdvanced", "inputs": { "clip": [ "100", 0 ], "steps": 30, "prompt": "photograph of a blonde woman standing in a forest", "width": 1536, "height": 1536, "target_width": 1024, "target_height": 1024, "guidance": 3.5 } }, "7": { "class_type": "SwarmClipTextEncodeAdvanced", "inputs": { "clip": [ "100", 0 ], "steps": 30, "prompt": "", "width": 832, "height": 832, "target_width": 1024, "target_height": 1024, "guidance": 3.5 } }, "10": { "class_type": "SwarmKSampler", "inputs": { "model": [ "4", 0 ], "noise_seed": 1617395626, "steps": 30, "cfg": 1, "sampler_name": "euler", "scheduler": "simple", "positive": [ "6", 0 ], "negative": [ "7", 0 ], "latent_image": [ "5", 0 ], "start_at_step": 0, "end_at_step": 10000, "return_with_leftover_noise": "disable", "add_noise": "enable", "var_seed": 0, "var_seed_strength": 0, "sigma_min": -1, "sigma_max": -1, "rho": 7, "previews": "default", "tile_sample": False, "tile_size": 1024 } }, "8": { "class_type": "VAEDecode", "inputs": { "vae": [ "101", 0 ], "samples": [ "10", 0 ] } }, "102": { "class_type": "NunchakuFluxDiTLoader", "inputs": { "model_path": "flux-fill-nunchaku-int4", "cache_threshold": 0.1, "attention": "nunchaku-fp16", "cpu_offload": "auto", "device_id": 0, "data_type": "float16", "i2f_mode": "enabled" } }, "103": { "class_type": "SwarmClipSeg", "inputs": { "images": [ "8", 0 ], "match_text": "face", "threshold": 0.5 } }, "104": { "class_type": "SwarmMaskBlur", "inputs": { "mask": [ "103", 0 ], "blur_radius": 10, "sigma": 1 } }, "105": { "class_type": "GrowMask", "inputs": { "mask": [ "104", 0 ], "expand": 16, "tapered_corners": True } }, "106": { "class_type": "SwarmMaskThreshold", "inputs": { "mask": [ "105", 0 ], "min": 0.01, "max": 1 } }, "107": { "class_type": "SwarmMaskBounds", "inputs": { "mask": [ "106", 0 ], "grow": 16 } }, "108": { "class_type": "SwarmImageCrop", "inputs": { "image": [ "8", 0 ], "x": [ "107", 0 ], "y": [ "107", 1 ], "width": [ "107", 2 ], "height": [ "107", 3 ] } }, "109": { "class_type": "CropMask", "inputs": { "mask": [ "106", 0 ], "x": [ "107", 0 ], "y": [ "107", 1 ], "width": [ "107", 2 ], "height": [ "107", 3 ] } }, "110": { "class_type": "SwarmImageScaleForMP", "inputs": { "image": [ "108", 0 ], "width": 1024, "height": 1024, "can_shrink": True } }, "111": { "class_type": "VAEEncodeForInpaint", "inputs": { "vae": [ "101", 0 ], "pixels": [ "110", 0 ], "mask": [ "109", 0 ], "grow_mask_by": 6 } }, "112": { "class_type": "SetLatentNoiseMask", "inputs": { "samples": [ "111", 0 ], "mask": [ "109", 0 ] } }, "113": { "class_type": "DifferentialDiffusion", "inputs": { "model": [ "102", 0 ] } }, "114": { "class_type": "SwarmClipTextEncodeAdvanced", "inputs": { "clip": [ "100", 0 ], "steps": 30, "prompt": "closeup photograph of a woman's face", "width": 1536, "height": 1536, "target_width": 1024, "target_height": 1024, "guidance": 3.5 } }, "115": { "class_type": "SolidMask", "inputs": { "value": 1, "width": 1024, "height": 1024 } }, "116": { "class_type": "InpaintModelConditioning", "inputs": { "positive": [ "114", 0 ], "negative": [ "7", 0 ], "vae": [ "101", 0 ], "pixels": null, "mask": [ "115", 0 ], "noise_mask": False } }, "117": { "class_type": "SwarmKSampler", "inputs": { "model": [ "113", 0 ], "noise_seed": 1617395628, "steps": 30, "cfg": 1, "sampler_name": "euler", "scheduler": "simple", "positive": [ "116", 0 ], "negative": [ "116", 1 ], "latent_image": [ "116", 2 ], "start_at_step": 3, "end_at_step": 10000, "return_with_leftover_noise": "disable", "add_noise": "enable", "var_seed": 0, "var_seed_strength": 0, "sigma_min": -1, "sigma_max": -1, "rho": 7, "previews": "default", "tile_sample": False, "tile_size": 1024 } }, "118": { "class_type": "VAEDecode", "inputs": { "vae": [ "101", 0 ], "samples": [ "117", 0 ] } }, "119": { "class_type": "ImageScale", "inputs": { "image": [ "118", 0 ], "width": [ "107", 2 ], "height": [ "107", 3 ], "upscale_method": "lanczos", "crop": "disabled" } }, "120": { "class_type": "ThresholdMask", "inputs": { "mask": [ "109", 0 ], "value": 0.001 } }, "121": { "class_type": "SwarmImageCompositeMaskedColorCorrecting", "inputs": { "destination": [ "8", 0 ], "source": [ "119", 0 ], "mask": [ "120", 0 ], "x": [ "107", 0 ], "y": [ "107", 1 ], "resize_source": False, "correction_method": "Linear" } }, "9": { "class_type": "SwarmSaveImageWS", "inputs": { "images": [ "121", 0 ], "bit_depth": "8bit" } } } 2025-05-28 17:10:34.186 [Debug] Refused to generate image for local: ComfyUI execution error: 'NoneType' object has no attribute 'shape'

Other

If you go look at the Comfy Workflow tab for (i), you can see that the second SwarmKSampler node isn't using the output of the first SwarmKSampler node at all. Both of them get their inputs from InpaintModelConditioning nodes that use the same pixels and the same mask. The mask generated by segmentation, and the image generated by the first SwarmKSampler node, are used as a target to composite the output of the second SwarmKSampler node onto, but they're not passed back as inputs to the second SwarmKSample node.

Left side of (i) workflow. Image

JSON FluxSegment.json

I think what's going wrong here is that the segment processing is creating a KSampler here, and sending in the previous output as a latent:

https://github.com/mcmonkeyprojects/SwarmUI/blob/57ae713fee2a49ced90c1b97f14c4962b8055b72/src/BuiltinExtensions/ComfyUIBackend/WorkflowGeneratorSteps.cs#L1414

but inside the KSampler setup, Flux fill has special case logic

https://github.com/mcmonkeyprojects/SwarmUI/blob/57ae713fee2a49ced90c1b97f14c4962b8055b72/src/BuiltinExtensions/ComfyUIBackend/WorkflowGenerator.cs#L1474

that uses pixels from FinalInputImage and FinalMask as inputs instead of the latent that was passed in.

I suspect this is what's going wrong in case (ii) as well, except that when we're doing a pure t2i gen with Flux Dev as the base, we don't even have FinalInputImage available for Flux Fill as the Segment model, so we get an exception that 'pixels' is empty.

Left side of (ii) workflow:

Image

willhsmit avatar May 29 '25 01:05 willhsmit