ControlNet
ControlNet copied to clipboard
ControlNet with SAM mask condition
Thanks for the great ControlNet. We have trained a ControlNet based on the SAM segmentation mask for fine-grained image generation. This model can help to use SAM for image generation and editing. We hope this side project can help with more interesting applications. Looking for contributions and suggestions! https://github.com/sail-sg/EditAnything
how is the label color computed? random color?
Yes, the label color is randomly generated. We didn't use the random color for training, it's just for visualization.
cool! is it fine tuned from canny or scribble model?
We train it from the stable diffusion2 using the SAM mask as the condition. Not pretrained canny or scribble model is used. Do you think using these models as the pretrained model would be better?
how is the label color computed? random color?
If we can have segmented controlnet training, this would open up a ton of possibilities. @lllyasviel, any chance to add this to the repo? : )
@alelordelo I was actually going to ask if this is possible as I plan to train the control net on my dataset comprised of segmentations and prompts.
I assume it is currently not possible to train with segmentations then?
I think not now... : /
@gasvn @lllyasviel @sethupavan12 , did you do any progress on the segmented training?
@alelordelo Yes....in a manner of speaking. Basically I simplified my problem. Initially, I had a segmentation image like the following
But then I realised I didn't need the other body parts as my main focus is just clothing/ body area as I was working on virtual try on problem.
So I masked out the useless areas and masked the areas I want to train on.
Masked area cut from main image looks like
Then trained with the prompts the original image has something like Woman wearing blue skirt etc.
In the end it turned out alright..
Might not apply to everyone but yeah thats what I ended up doing.
But ofc removing the "useless" area would mean I can't generate as low-level I want, or specific areas. Segmentation training would save you a lot of time. but if u got time, maybe u can just mask each part and train each part ? I have not tried this so idk
Ok, so basically you trained: prompt + segmentation map (all masks) = image
And then inference: prompt + masked area = output image
Is that it?
I guess this will work for dresses as your example, but not for full looks (shirt + pants + shoes)? For this case, you would need to match prompt with mask region I think?