sd-webui-controlnet icon indicating copy to clipboard operation
sd-webui-controlnet copied to clipboard

✨ Added guidance start parameter.

Open ashen-sensored opened this issue 1 year ago • 1 comments

ashen-sensored avatar Feb 26 '23 16:02 ashen-sensored

By shifting the guidance start time, allowing the vanilla Unet to lay out the foundation at high noise before applying correction by ControlNet, it is possible to retain most of the information from the original generation. image image image image image

ashen-sensored avatar Feb 26 '23 16:02 ashen-sensored

Demo: Guidance Start:0 (default behavior) image

Guidance Start:0.19 image

ashen-sensored avatar Feb 26 '23 18:02 ashen-sensored

Looks good. Also need to make some changes in API handler

Mikubill avatar Feb 26 '23 18:02 Mikubill

Looks good. Also need to make some changes in API handler

The new parameter related change has been applied to api.py and xyz_grid_support.py. I did a directory search, I think I covered all related locations.

ashen-sensored avatar Feb 26 '23 19:02 ashen-sensored

The following change is probably required:

params = [None] * 14 > params = PARAM_COUNT

aiton-sd avatar Feb 27 '23 06:02 aiton-sd

I think when guidance was added (I updated the ControlNet extension for the first time last night), it broke the ability to use it in batches. I need to do some more testing, but when doing prompt-travel with ControlNet enabled, which used to work just fine, I was only seeing the impacts of ControlNet for the first 1 or 2 images of the prompt travel. And that would make sense based on how Guidance works, if there's a counter that gets reset only when Generate is clicked.

I also think that the very way these Guidance scroll bars are laid out is... confusing and misleading. I was so confused when messing around with them the first time, as to why I was getting radically different results between 0.16 and 0.17. It's not at all obvious from the name that it's actually a "percentage of your steps", which gets multiplied and then converted to an integer, which isn't at all an intuitive way to do it and requires that the person do math to figure out what number to set vs. how many steps they want it to run for. And also works differently from the bracket notation in AUTOMATIC1111 itself, where you specify the number of steps.

That said, it's undeniably a cool addition!

ED: Come to think about it, I only had the one guidance bar, for the guidance end. Looks like I need to update again and see if the first bug got fixed in the process of the second bar getting added...

enn-nafnlaus avatar Feb 27 '23 10:02 enn-nafnlaus

How do you even make it work? No matter what I do I get nothing good...

Magicalore avatar Feb 27 '23 13:02 Magicalore

How does ControlNet? Works fine for me, right out of the box. Try following a tutorial step-by-step and tell us which one and at which step it goes wrong for you.

enn-nafnlaus avatar Feb 27 '23 13:02 enn-nafnlaus

How does ControlNet? Works fine for me, right out of the box. Try following a tutorial step-by-step and tell us which one and at which step it goes wrong for you.

ControlNet works fine, I'm trying to get better hands to appear in the image by merging a depth map of some hands and the openpose model together and I tried with Guidance Start at 0 or at 0.19 and I get way worse results, even without the openpose on it cannot recognize from the depth map that these are hands Screenshot_3 Screenshot_4 Screenshot_5

Magicalore avatar Feb 27 '23 13:02 Magicalore

I think the example here wasn't made very clear because it was broken up into several comments with little explanation. This is what it's doing.

The original generated image is below, no ControlNet is used: image

The hand is obviously a mess. So, they took this image into Blender and created a hand depth pass as a guide. This is the depth pass they used. image

They then used this depth pass with ControlNet. However, the default settings make ControlNet affect the output during the entire generation time. This means that all the empty information in the depth pass (the black area) is accounted for, and causes the output to become bad and not match the original prompt. This is that output: image

So, why don't we delay ControlNet from kicking in so we can let the original noise do it's thing, and then we can control it to fix the hand? That's why this PR was created. Now, when ControlNet is delayed (in this case, only a few steps in, as the value used is 0.19, which relative to 20 steps is not that much), the original composition can play out, but the hand can be fixed. This is that image: image

tl;dr: This seems to be a way to allow for implementing passes that only control certain elements, without destroying the original image. I don't think this will be entirely useful for if you're generating something completely from scratch, i.e. trying to use the hand depth pass on a completely different seed. It requires knowledge of what is generated normally. A depth pass from Blender is also a bit overkill imo -- think of if you were to use a different module like scribble instead.

catboxanon avatar Feb 27 '23 14:02 catboxanon

Oh thank you! Yes by bad I misunderstood how this worked!

Magicalore avatar Feb 27 '23 17:02 Magicalore

A quick and simple question to whoever has deep understanding of ControlNet structure:

– Why we cannot have a spatial weight on it, to get "mask" for applying to the ControlNet itself? So we will be able to mask-out everything except the hand on depth map, and then it theoretically would not mess with the other parts of the image.

Is this is not physically possible? Is the weight is not applied to every pixel/latent (even controlling up to 8*8 squares would be great!) independently?

aleksusklim avatar Feb 27 '23 18:02 aleksusklim

I think the example here wasn't made very clear because it was broken up into several comments with little explanation. This is what it's doing.

The original generated image is below, no ControlNet is used: image

The hand is obviously a mess. So, they took this image into Blender and created a hand depth pass as a guide. This is the depth pass they used. image

They then used this depth pass with ControlNet. However, the default settings make ControlNet affect the output during the entire generation time. This means that all the empty information in the depth pass (the black area) is accounted for, and causes the output to become bad and not match the original prompt. This is that output: image

So, why don't we delay ControlNet from kicking in so we can let the original noise do it's thing, and then we can control it to fix the hand? That's why this PR was created. Now, when ControlNet is delayed (in this case, only a few steps in, as the value used is 0.19, which relative to 20 steps is not that much), the original composition can play out, but the hand can be fixed. This is that image: image

tl;dr: This seems to be a way to allow for implementing passes that only control certain elements, without destroying the original image. I don't think this will be entirely useful for if you're generating something completely from scratch, i.e. trying to use the hand depth pass on a completely different seed. It requires knowledge of what is generated normally. A depth pass from Blender is also a bit overkill imo -- think of if you were to use a different module like scribble instead.

Although that does help, wouldn't it be enough to edit the depth map of the image itself? With the same settings you would generate the same image only edited. It is also useful to leave the same image in img2img as a guide + pose + depth edit (or just scrible). In any case, I've found much more unique uses for this feature. Thanks for the addition.👍

AbyszOne avatar Feb 27 '23 18:02 AbyszOne