ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

[Feature Request] Please add "reference_only" ControlNet feature

Open MoonMoon82 opened this issue 2 years ago • 16 comments

There is a new ControlNet feature called "reference_only" which seems to be a preprocessor without any controlnet model. Please add this feature to the controlnet nodes.

Kind regards

https://www.youtube.com/watch?v=tBwmbTwMxfQ

MoonMoon82 avatar May 16 '23 08:05 MoonMoon82

@BlenderNeko Maybe you have an idea how this "reference_only" preprocessor could work in comfyui ?

MoonMoon82 avatar May 28 '23 16:05 MoonMoon82

Reference only is way more involved as it is technically not a controlnet, and would require changes to the unet code. There has been some talk and thought about implementing it in comfy, but so far the consensus was to at least wait a bit for the reference_only implementation in the cnet repo to stabilize, or have some source that clearly explains why and what they are doing.

It's likely that we'd see an implementation of this before any kind of reference only support, simply because of ease of implementation. Perhaps that could in part fill a similar role.

BlenderNeko avatar May 28 '23 17:05 BlenderNeko

+1 for the request to have controlnet reference.

I was trying to make emotion cards for silly tavern, in comfy ui that should have been a doddle setting up a workflow so it would generate an image then use that image to create 26 more each with a different emotion but the same person. with out reference only though thats just not do-able.

GamingDaveUk avatar Jun 10 '23 23:06 GamingDaveUk

Guess it's not going to be implemented. https://desuarchive.org/g/thread/94223958/#94225957

catboxanon avatar Jun 22 '23 19:06 catboxanon

Here's a simple node for it, if it works fine I'll put it somewhere more visible, download and save that reference_only.py to your custom_nodes folder: https://gist.github.com/comfyanonymous/343e5675f9a2c8281fde0c440df2e2c6

Copy and ctrl-v this to the UI for the workflow:

{
  "last_node_id": 15,
  "last_link_id": 37,
  "nodes": [
    {
      "id": 8,
      "type": "VAEDecode",
      "pos": [
        1209,
        188
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 7
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 8
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            9
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 6,
      "type": "CLIPTextEncode",
      "pos": [
        233,
        117
      ],
      "size": {
        "0": 422.84503173828125,
        "1": 164.31304931640625
      },
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 3
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            4
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "crude drawing of girl"
      ]
    },
    {
      "id": 7,
      "type": "CLIPTextEncode",
      "pos": [
        237,
        370
      ],
      "size": {
        "0": 425.27801513671875,
        "1": 180.6060791015625
      },
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 5
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            6
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "CLIPTextEncode"
      },
      "widgets_values": [
        "text, watermark"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        863,
        186
      ],
      "size": {
        "0": 315,
        "1": 262
      },
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 37
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 4
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 6
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 34
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            7
          ],
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        719286772344905,
        "fixed",
        20,
        8,
        "euler",
        "normal",
        1
      ]
    },
    {
      "id": 9,
      "type": "SaveImage",
      "pos": [
        1548,
        180
      ],
      "size": [
        1454.6668601568254,
        548.2885143635223
      ],
      "flags": {},
      "order": 9,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 9
        }
      ],
      "properties": {},
      "widgets_values": [
        "refer/ComfyUI"
      ]
    },
    {
      "id": 4,
      "type": "CheckpointLoaderSimple",
      "pos": [
        -563,
        510
      ],
      "size": {
        "0": 315,
        "1": 98
      },
      "flags": {},
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            32
          ],
          "slot_index": 0
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            3,
            5
          ],
          "slot_index": 1
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            8,
            20
          ],
          "slot_index": 2
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "sd_xl_1.0.safetensors"
      ]
    },
    {
      "id": 14,
      "type": "ImageScale",
      "pos": [
        -129,
        763
      ],
      "size": {
        "0": 315,
        "1": 130
      },
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "image",
          "type": "IMAGE",
          "link": 19
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            18
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "ImageScale"
      },
      "widgets_values": [
        "nearest-exact",
        768,
        768,
        "center"
      ]
    },
    {
      "id": 13,
      "type": "LoadImage",
      "pos": [
        -483,
        777
      ],
      "size": {
        "0": 315,
        "1": 314
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            19
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "MASK",
          "type": "MASK",
          "links": null,
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "LoadImage"
      },
      "widgets_values": [
        "example.png",
        "image"
      ]
    },
    {
      "id": 15,
      "type": "ReferenceOnlySimple",
      "pos": [
        515,
        675
      ],
      "size": {
        "0": 315,
        "1": 78
      },
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 32,
          "slot_index": 0
        },
        {
          "name": "reference",
          "type": "LATENT",
          "link": 35
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            37
          ],
          "shape": 3,
          "slot_index": 0
        },
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            34
          ],
          "shape": 3,
          "slot_index": 1
        }
      ],
      "properties": {
        "Node name for S&R": "ReferenceOnlySimple"
      },
      "widgets_values": [
        2
      ]
    },
    {
      "id": 12,
      "type": "VAEEncode",
      "pos": [
        248,
        732
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "pixels",
          "type": "IMAGE",
          "link": 18,
          "slot_index": 0
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 20,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            35
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEEncode"
      }
    }
  ],
  "links": [
    [
      3,
      4,
      1,
      6,
      0,
      "CLIP"
    ],
    [
      4,
      6,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      5,
      4,
      1,
      7,
      0,
      "CLIP"
    ],
    [
      6,
      7,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      7,
      3,
      0,
      8,
      0,
      "LATENT"
    ],
    [
      8,
      4,
      2,
      8,
      1,
      "VAE"
    ],
    [
      9,
      8,
      0,
      9,
      0,
      "IMAGE"
    ],
    [
      18,
      14,
      0,
      12,
      0,
      "IMAGE"
    ],
    [
      19,
      13,
      0,
      14,
      0,
      "IMAGE"
    ],
    [
      20,
      4,
      2,
      12,
      1,
      "VAE"
    ],
    [
      32,
      4,
      0,
      15,
      0,
      "MODEL"
    ],
    [
      34,
      15,
      1,
      3,
      3,
      "LATENT"
    ],
    [
      35,
      12,
      0,
      15,
      1,
      "LATENT"
    ],
    [
      37,
      15,
      0,
      3,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {},
  "version": 0.4
}

comfyanonymous avatar Jul 26 '23 17:07 comfyanonymous

I put it in this repo: https://github.com/comfyanonymous/ComfyUI_experiments

comfyanonymous avatar Jul 27 '23 01:07 comfyanonymous

Hello, I'm trying to install reference_only but I get this error:

"title":"ComfyUI_experiments/reference_only.py at master · comfyanonymous/ComfyUI_experiments","locale":"en"} NameError: name 'true' is not defined

Cannot import C:\Users\GABSU\comfyui\ComfyUI\custom_nodes\reference_only.py module for custom nodes: name 'true' is not defined

Have any idea why this is happening? I've added the reference_only.py to custom_nodes folder. Thank you so much for your work :)

patriciagomesoo avatar Jul 31 '23 14:07 patriciagomesoo

Hey @comfyanonymous the installation workflow work well for me but the result are pretty bad. Could you share an workflow example that work well for you ? I'm so far from the result show in the video

julien-blanchon avatar Aug 16 '23 23:08 julien-blanchon

Wow! It works perfect for me. It is possible use two reference input? To process video frames, for example..

addddd2 avatar Aug 24 '23 00:08 addddd2

@addddd2 I opened a feature request some weeks ago to get something like img2img with reference_only node: https://github.com/comfyanonymous/ComfyUI_experiments/issues/5

I already tried it on my own, but I guess this kind of img2img does not work that way.. The results looked more like the input image than the reference image. Maybe @comfyanonymous could say something more about it...

MoonMoon82 avatar Aug 24 '23 06:08 MoonMoon82

Sorry for this code. Did the best I could. I hope the author does the right thing.

It receives as input two reference and one intended for img2img

--- reference_only.py	2023-07-26 22:24:24.000000000 +0300
+++ reference_only3.py	2023-08-25 00:51:27.233217800 +0300
@@ -1,10 +1,12 @@
 import torch
 
-class ReferenceOnlySimple:
+class ReferenceOnlySimple3:
     @classmethod
     def INPUT_TYPES(s):
         return {"required": { "model": ("MODEL",),
                               "reference": ("LATENT",),
+                              "reference2": ("LATENT",),
+                              "input": ("LATENT",),
                               "batch_size": ("INT", {"default": 1, "min": 1, "max": 64})
                               }}
 
@@ -13,28 +15,31 @@
 
     CATEGORY = "custom_node_experiments"
 
-    def reference_only(self, model, reference, batch_size):
+    def reference_only(self, model, reference, reference2, input, batch_size):
         model_reference = model.clone()
         size_latent = list(reference["samples"].shape)
         size_latent[0] = batch_size
-        latent = {}
-        latent["samples"] = torch.zeros(size_latent)
+        latent = input
 
-        batch = latent["samples"].shape[0] + reference["samples"].shape[0]
+        batch = latent["samples"].shape[0] + reference["samples"].shape[0] + reference2["samples"].shape[0]
+  
+        
         def reference_apply(q, k, v, extra_options):
             k = k.clone().repeat(1, 2, 1)
             offset = 0
             if q.shape[0] > batch:
                 offset = batch
+                
+            re = extra_options["transformer_index"] % 2
 
             for o in range(0, q.shape[0], batch):
                 for x in range(1, batch):
-                    k[x + o, q.shape[1]:] = q[o,:]
+                    k[x + o, q.shape[1]:] = q[o + re,:]
 
             return q, k, k
 
         model_reference.set_model_attn1_patch(reference_apply)
-        out_latent = torch.cat((reference["samples"], latent["samples"]))
+        out_latent = torch.cat((reference["samples"], reference2["samples"], latent["samples"]))
         if "noise_mask" in latent:
             mask = latent["noise_mask"]
         else:
@@ -47,8 +52,8 @@
             mask = mask.repeat(latent["samples"].shape[0], 1, 1)
 
         out_mask = torch.zeros((1,mask.shape[1],mask.shape[2]), dtype=torch.float32, device="cpu")
-        return (model_reference, {"samples": out_latent, "noise_mask": torch.cat((out_mask, mask))})
+        return (model_reference, {"samples": out_latent, "noise_mask": torch.cat((out_mask,out_mask, mask))})
 
 NODE_CLASS_MAPPINGS = {
-    "ReferenceOnlySimple": ReferenceOnlySimple,
+    "ReferenceOnlySimple3": ReferenceOnlySimple3,
 }

addddd2 avatar Aug 24 '23 22:08 addddd2

Can you show your workflow? Somehow it is not working very well for me.

On Thu, Aug 24, 2023, 02:20 addddd2 @.***> wrote:

Wow! It works perfect for me. It is possible use two reference input? To process video frames, for example..

— Reply to this email directly, view it on GitHub https://github.com/comfyanonymous/ComfyUI/issues/661#issuecomment-1690806371, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM7X3GXRFE7POIGFQM7UIETXW2M3NANCNFSM6AAAAAAYDJOJUU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ntdviet avatar Aug 25 '23 01:08 ntdviet

@ntdviet workflows is here https://github.com/comfyanonymous/ComfyUI_experiments/issues/5

addddd2 avatar Sep 23 '23 12:09 addddd2

I made a custom node that supports reference only and reference only + adain, and it can also adjust the style strength.

This is a diffusers-based custom node, which is used differently than Comfy's KSampler-based one.

https://github.com/Jannchie/ComfyUI-J https://civitai.com/models/361265/comfyui-j-diffusers-based-pipeline-nodes

Jannchie avatar Mar 21 '24 17:03 Jannchie