ControlNet icon indicating copy to clipboard operation
ControlNet copied to clipboard

[Double Control] What double-control model is most needed?

Open lllyasviel opened this issue 3 years ago • 4 comments

Discussed in https://github.com/lllyasviel/ControlNet/discussions/30

Originally posted by lllyasviel February 12, 2023 We plan to train some models with "double controls", use two concat control maps and we are considering using images with holes as the second control map. This will lead to some model like "depth-aware inpainting" or "canny-edge-aware inpainting". Please also let us know if you have good suggestions.

This is a re-post. Please go to disscussion for disscussion.

lllyasviel avatar Feb 13 '23 01:02 lllyasviel

I guess pose + single source image control would be useful, at least for anime. Although a custom character dreambooth model with https://github.com/lllyasviel/ControlNet/discussions/12 seems to work, single image pose shifting is really attractive to me.

ajundo avatar Feb 13 '23 03:02 ajundo

  1. depth + segmentation? for example I would like to render movie scene
  2. t-1 rendered frame and t+1 keyframe frame? When you want to render movies in anime style and want temporal stability in output. When I am trying just naive pixel img2img, each output frame is slightly different and it looks quite noisy

Take a look on my video made with instruct pix2pix ... https://www.reddit.com/r/StableDiffusion/comments/10x4fkr/pip2pix_marble_terminator/?utm_source=share&utm_medium=web2x&context=3

  1. novel view synthesis? Given one, two or more images of an object, generate a new view of the same object. For example I have generated sneakers image and now I want to generate new views to be able to manufacture it. Example: https://thissneakerdoesnotexist.com/3d-info/

batrlatom avatar Feb 13 '23 08:02 batrlatom

Is this simply concatenating additional input channels on the hint image, or actually combining two separately trained control networks?

i would see both as extremely useful.

sam598 avatar Feb 13 '23 19:02 sam598

Potentially a naive question, but I'm wondering about using vector inputs like FaceNet/CLIP/etc embeddings as a second control, rather than spatial inputs like depth/edges/etc?

josephrocca avatar Feb 15 '23 09:02 josephrocca