sd-webui-controlnet icon indicating copy to clipboard operation
sd-webui-controlnet copied to clipboard

[ControlNet 1.1] The updating track.

Open lllyasviel opened this issue 1 year ago • 196 comments

We will use this repo to track some discussions for updating to ControlNet 1.1.

lllyasviel avatar Apr 13 '23 04:04 lllyasviel

Update: ControlNet 1.1 is released here.

lllyasviel avatar Apr 13 '23 05:04 lllyasviel

I think we can ignore cnet11 Tile model right now. We are not very sure how to make use of it. The inpainting model may need more considerations in implementation and perhaps we just get other models first.

lllyasviel avatar Apr 13 '23 06:04 lllyasviel

The inpainting model may need more considerations in implementation and perhaps we just get other models first.

I’m the author of sd-webui-segment-anything and I am planning to connect my extension to your inpainting model.

So at this moment, the inpainting ControlNet cannot target at the mask only while not changing other parts, right?

Edit on 2023/04/18 already connected. Checkout my extension readme for how to use.

continue-revolution avatar Apr 13 '23 10:04 continue-revolution

I think we can ignore cnet11 Tile model right now. We are not very sure how to make use of it. The inpainting model may need more considerations in implementation and perhaps we just get other models first.

I have been long working on tiles. Have you tried cooperating with noise inversion tricks? I think this can be very good, with a better trained model it may be comparable to the quality of GigaGAN.

My extension is here -> https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111

I will adapt your tile model to see the result and update it here.

pkuliyi2015 avatar Apr 13 '23 11:04 pkuliyi2015

Yes the tile model can be a saviour for upscaling and no doubles

2blackbar avatar Apr 13 '23 13:04 2blackbar

This thread is already amazing. ^ 3 amazing devs collaborating

halr9000 avatar Apr 13 '23 13:04 halr9000

The inpainting model may need more considerations in implementation and perhaps we just get other models first.

I’m the author of sd-webui-segment-anything and I am planning connect my extension to your inpainting model.

So at this moment, the inpainting ControlNet cannot target at the mask only while not changing other parts, right?

my gradio demo does not have masked diffusion in it. what is displayed now is just original results from standard non-masked diffusion. but masked diffusion will be better.

lllyasviel avatar Apr 13 '23 14:04 lllyasviel

The model works as expected in automatic1111 txt2img; it does generate the guided content.

However, as I directly download the model and use it in this extension, it produces severe artifacts. I read the source code for a while but still not clear what should be done to make it work.

Some initial observations:

  • Severe ghost shadows and duplicated contours, regardless of tile overlaps
  • Faded colors in txt2img (even if with 840000 VAEs)
  • Has no effect when using noise inversion (maybe this is my code flaws; I'm checking it).

See here for one result: https://imgsli.com/MTY5ODQw

pkuliyi2015 avatar Apr 13 '23 14:04 pkuliyi2015

what preprocessor we should use with tile controlnet model ? Using it without preprocessor gets "some" results but the resolution is kinda lower than if i would inpaint with 0.55 denoise, have to use cfg 2-3

2blackbar avatar Apr 13 '23 14:04 2blackbar

The inpainting model may need more considerations in implementation and perhaps we just get other models first.

I’m the author of sd-webui-segment-anything and I am planning connect my extension to your inpainting model. So at this moment, the inpainting ControlNet cannot target at the mask only while not changing other parts, right?

my gradio demo does not have masked diffusion in it. what is displayed now is just original results from standard non-masked diffusion. but masked diffusion will be better.

Do you think there is a need to wait for an update of this extension? Is the current extension compatible with the new models, especially the inpainting model?

continue-revolution avatar Apr 13 '23 14:04 continue-revolution

The model works as expected in automatic1111 txt2img; it does generate the guided content.

However, as I directly download the model and use it in this extension, it produces severe artifacts. I read the source code for a while but still not clear what should be done to make it work.

Some initial observations:

  • Severe ghost shadows and duplicated contours, regardless of tile overlaps
  • Faded colors in txt2img (even if with 840000 VAEs)
  • Has no effect when using noise inversion (maybe this is my code flaws; I'm checking it).

See here for one result: https://imgsli.com/MTY5ODQw

which one is cn11tile? left or right?

lllyasviel avatar Apr 13 '23 15:04 lllyasviel

The model works as expected in automatic1111 txt2img; it does generate the guided content. However, as I directly download the model and use it in this extension, it produces severe artifacts. I read the source code for a while but still not clear what should be done to make it work. Some initial observations:

  • Severe ghost shadows and duplicated contours, regardless of tile overlaps
  • Faded colors in txt2img (even if with 840000 VAEs)
  • Has no effect when using noise inversion (maybe this is my code flaws; I'm checking it).

See here for one result: https://imgsli.com/MTY5ODQw

which one is cn11tile? left or right?

The right one. I must have done something wrong. But until now I cannot fix it.

pkuliyi2015 avatar Apr 13 '23 16:04 pkuliyi2015

Is there a PR in this repo yet for implementing ControlNet v1.1?

ProGamerGov avatar Apr 13 '23 17:04 ProGamerGov

The model works as expected in automatic1111 txt2img; it does generate the guided content. However, as I directly download the model and use it in this extension, it produces severe artifacts. I read the source code for a while but still not clear what should be done to make it work. Some initial observations:

  • Severe ghost shadows and duplicated contours, regardless of tile overlaps
  • Faded colors in txt2img (even if with 840000 VAEs)
  • Has no effect when using noise inversion (maybe this is my code flaws; I'm checking it).

See here for one result: https://imgsli.com/MTY5ODQw

which one is cn11tile? left or right?

The right one. I must have done something wrong. But until now I cannot fix it.

from the result it looks like your input image is bigger than h/8 w/8.

for example, if you diffuse at 512 512, your tile need to be 64 64 and then use 3 cv2. pyrup to interpolate to 512.

or you can add a gaussion blur to the inputs to make it smoother

lllyasviel avatar Apr 13 '23 17:04 lllyasviel

Hi I have a recommended list of updates:

Control Model: Implement the global average pooling before injection – read "global_average_pooling" item in the yaml file.

Depth: Rename “depth” to “depth_midas” “depth_leres” is already good Add “depth_zoe”

Normal: Add “normal_bae” Remove previous “normal” (or rename it to “normal_midas”)

Canny/MLSD: already good

Scribble: Rename “fake_scribble” to “scribble_hed” Add “scribble_pidi” Remove “scribble” (it seems that this one is just binarize, sounds confusing, or just "threshold"?)

SoftEdge: Rename “HED” to “softedge_hed” Add “softedge_pidi” Add “softedge_hedsafe”, “softedge_pidisafe” Rename “pidinet” to “sketch_t2iadapter”

Segmentation: Rename “seg” to “seg_ufade20K” Add “seg_ofade20K ” and “seg_ofcoco”

Openpose: “openpose” is good Remove “openpose_hand” Add “openpose_full”

Lineart: Add “lineart” Add “lineart_coarse” Add “lineart_anime”

Shuffle: Add “shuffle”

What do you think?

lllyasviel avatar Apr 13 '23 18:04 lllyasviel

That list looks good to me.

Are the instructpix2pix and inpainting models already working out of the box? The former seemed to work but I also felt like it gave me mixed results, but I wasn't going to judge the quality yet, not knowing if it's missing something. Inpainting model I haven't tried yet. Tile model I assume would come a bit later since the model itself is in an unfinished state currently.

catboxanon avatar Apr 13 '23 19:04 catboxanon

PR WIP at https://github.com/Mikubill/sd-webui-controlnet/pull/742.

CCRcmcpe avatar Apr 13 '23 19:04 CCRcmcpe

Recently renaming of annotators caused some downstream developers unhappy. We can implement renamings as display name change instead of ID change which causes API breaks.

Also on naming, the annotator name should imply which cnet model should be used and vice versa.

CCRcmcpe avatar Apr 13 '23 19:04 CCRcmcpe

i have an idea. what about adding some descriptions to yaml file of each cnet like "xxx_canny.yaml" has a "desc: this model needs canny preprocessor" and show it to gradio ui?

lllyasviel avatar Apr 13 '23 19:04 lllyasviel

The gradio part seems less than ideal. List items cannot show hover infos, at least, I tried the the DDIM sampler item in WebUI, it doesn't, however if you select it and hover on the selection box it shows.

CCRcmcpe avatar Apr 13 '23 19:04 CCRcmcpe

i mean like adding a gradio.label or something and show some desc text from model yaml after a model is loaded. besides i think for api it is ok to have alias names.

lllyasviel avatar Apr 13 '23 19:04 lllyasviel

if u think it is ok i will begin to work on all 14 yaml files

lllyasviel avatar Apr 13 '23 19:04 lllyasviel

What about the old cnets (prior to 1.1)? They have no isolated yamls. I think it's better to implement it at code level, which is also localization friendly. I will wait response from repo owner.

CCRcmcpe avatar Apr 13 '23 19:04 CCRcmcpe

old cnets can just use blank text. we can only show texts when desc is avaliable

lllyasviel avatar Apr 13 '23 19:04 lllyasviel

@Mikubill why invert always binarize images? image

lllyasviel avatar Apr 13 '23 20:04 lllyasviel

now I have to invert outside using photoshop on my own to use the lineart model image

lllyasviel avatar Apr 13 '23 20:04 lllyasviel

That is a known issue, will be fixed.

CCRcmcpe avatar Apr 13 '23 23:04 CCRcmcpe

Would be awesome to auto select the most likely model after preproc is selected & vice versa. Won't prevent people from changing, but will save a needed step 90% of the time.

On Thu, Apr 13, 2023 at 3:30 PM lllyasviel @.***> wrote:

i have an idea. what about adding some descriptions to yaml file of each cnet like "xxx_canny.yaml" has a "desc: this model needs canny preprocessor" and show it to gradio ui?

— Reply to this email directly, view it on GitHub https://github.com/Mikubill/sd-webui-controlnet/issues/736#issuecomment-1507504870, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEQMXRITOKXTD7KL6AY53XBBH4FANCNFSM6AAAAAAW4RNXZE . You are receiving this because you commented.Message ID: @.***>

halr9000 avatar Apr 13 '23 23:04 halr9000

that anime colorise works nice even at 1080x1080 res and it works not only for anime stuff but works best with animelike models , this is 768
image image

2blackbar avatar Apr 14 '23 01:04 2blackbar

That list looks good to me.

Are the instructpix2pix and inpainting models already working out of the box? The former seemed to work but I also felt like it gave me mixed results, but I wasn't going to judge the quality yet, not knowing if it's missing something. Inpainting model I haven't tried yet. Tile model I assume would come a bit later since the model itself is in an unfinished state currently.

yes ip2p is very experimental. it is a model marked as [e].

But this model should be at least as robust as original ip2p. But seems that original ip2p is also not very robust.

Perhaps we can improve it by putting original image also in i2i and use the "denoising strength" to improve the robustness.

lllyasviel avatar Apr 14 '23 06:04 lllyasviel