ControlNet icon indicating copy to clipboard operation
ControlNet copied to clipboard

ControlNet with SAM mask condition

Open gasvn opened this issue 1 year ago • 7 comments

Thanks for the great ControlNet. We have trained a ControlNet based on the SAM segmentation mask for fine-grained image generation. This model can help to use SAM for image generation and editing. We hope this side project can help with more interesting applications. Looking for contributions and suggestions! https://github.com/sail-sg/EditAnything

gasvn avatar Apr 09 '23 09:04 gasvn

how is the label color computed? random color?

lllyasviel avatar Apr 09 '23 19:04 lllyasviel

Yes, the label color is randomly generated. We didn't use the random color for training, it's just for visualization.

gasvn avatar Apr 10 '23 04:04 gasvn

cool! is it fine tuned from canny or scribble model?

lllyasviel avatar Apr 10 '23 13:04 lllyasviel

We train it from the stable diffusion2 using the SAM mask as the condition. Not pretrained canny or scribble model is used. Do you think using these models as the pretrained model would be better?

gasvn avatar Apr 11 '23 07:04 gasvn

how is the label color computed? random color?

If we can have segmented controlnet training, this would open up a ton of possibilities. @lllyasviel, any chance to add this to the repo? : )

alelordelo avatar Apr 11 '23 10:04 alelordelo

@alelordelo I was actually going to ask if this is possible as I plan to train the control net on my dataset comprised of segmentations and prompts.

I assume it is currently not possible to train with segmentations then?

sethupavan12 avatar Apr 12 '23 14:04 sethupavan12

I think not now... : /

alelordelo avatar Apr 12 '23 18:04 alelordelo

@gasvn @lllyasviel @sethupavan12 , did you do any progress on the segmented training?

alelordelo avatar Jul 05 '23 20:07 alelordelo

@alelordelo Yes....in a manner of speaking. Basically I simplified my problem. Initially, I had a segmentation image like the following

image

But then I realised I didn't need the other body parts as my main focus is just clothing/ body area as I was working on virtual try on problem.

So I masked out the useless areas and masked the areas I want to train on.

Masked area cut from main image looks like image image

Then trained with the prompts the original image has something like Woman wearing blue skirt etc.

In the end it turned out alright.. image

Might not apply to everyone but yeah thats what I ended up doing.

sethupavan12 avatar Jul 05 '23 22:07 sethupavan12

But ofc removing the "useless" area would mean I can't generate as low-level I want, or specific areas. Segmentation training would save you a lot of time. but if u got time, maybe u can just mask each part and train each part ? I have not tried this so idk

sethupavan12 avatar Jul 05 '23 22:07 sethupavan12

Ok, so basically you trained: prompt + segmentation map (all masks) = image

And then inference: prompt + masked area = output image

Is that it?

I guess this will work for dresses as your example, but not for full looks (shirt + pants + shoes)? For this case, you would need to match prompt with mask region I think?

alelordelo avatar Jul 10 '23 16:07 alelordelo