ControlNet ControlNet with SAM mask condition

Thanks for the great ControlNet. We have trained a ControlNet based on the SAM segmentation mask for fine-grained image generation. This model can help to use SAM for image generation and editing. We hope this side project can help with more interesting applications. Looking for contributions and suggestions! https://github.com/sail-sg/EditAnything

Apr 09 '23 09:04 gasvn

how is the label color computed? random color?

Apr 09 '23 19:04 lllyasviel

Yes, the label color is randomly generated. We didn't use the random color for training, it's just for visualization.

Apr 10 '23 04:04 gasvn

cool! is it fine tuned from canny or scribble model?

Apr 10 '23 13:04 lllyasviel

We train it from the stable diffusion2 using the SAM mask as the condition. Not pretrained canny or scribble model is used. Do you think using these models as the pretrained model would be better?

Apr 11 '23 07:04 gasvn

how is the label color computed? random color?

If we can have segmented controlnet training, this would open up a ton of possibilities. @lllyasviel, any chance to add this to the repo? : )

Apr 11 '23 10:04 alelordelo

@alelordelo I was actually going to ask if this is possible as I plan to train the control net on my dataset comprised of segmentations and prompts.

I assume it is currently not possible to train with segmentations then?

Apr 12 '23 14:04 sethupavan12

I think not now... : /

Apr 12 '23 18:04 alelordelo

@gasvn @lllyasviel @sethupavan12 , did you do any progress on the segmented training?

Jul 05 '23 20:07 alelordelo

@alelordelo Yes....in a manner of speaking. Basically I simplified my problem. Initially, I had a segmentation image like the following

But then I realised I didn't need the other body parts as my main focus is just clothing/ body area as I was working on virtual try on problem.

So I masked out the useless areas and masked the areas I want to train on.

Masked area cut from main image looks like

Then trained with the prompts the original image has something like Woman wearing blue skirt etc.

In the end it turned out alright..

Might not apply to everyone but yeah thats what I ended up doing.

Jul 05 '23 22:07 sethupavan12

But ofc removing the "useless" area would mean I can't generate as low-level I want, or specific areas. Segmentation training would save you a lot of time. but if u got time, maybe u can just mask each part and train each part ? I have not tried this so idk

Jul 05 '23 22:07 sethupavan12

Ok, so basically you trained: prompt + segmentation map (all masks) = image

And then inference: prompt + masked area = output image

Is that it?

I guess this will work for dresses as your example, but not for full looks (shirt + pants + shoes)? For this case, you would need to match prompt with mask region I think?

Jul 10 '23 16:07 alelordelo

ControlNet ControlNet copied to clipboard

ControlNet with SAM mask condition

ControlNet
ControlNet copied to clipboard