diffusers
diffusers copied to clipboard
Stable-Diffusion-Inpainting: Training Pipeline V1.5, V2
What does this PR do?
This functionality allows training/fine-tuning of the 9 channel inpainting models provided by
- https://huggingface.co/stabilityai/stable-diffusion-2-inpainting
- https://huggingface.co/runwayml/stable-diffusion-inpainting
This is due to noticing that many inpainting models provided to the community e.g. on https://civitai.com/ have unets with 4 input channels. 4 channel models may lack capacity and eventually quality in the inpainting tasks. To support the community to develop fully fledged inpainting models I have modified the text_to_image training pipeline to do inpainting.
Additions:
- Added random masking strategy (squares) during the training, center crop during validation
- Take first 3 images of the pokemon dataset as validation set
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline?
- [x] Did you read our philosophy doc (important for complex PRs)?
- [ ] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [x] Did you write any new necessary tests?
Who can review?
@sayakpaul and @patrickvonplaten
Examples Out of Training Distribution Scenery:
Prompt: a drawing of a green pokemon with red eyes
Pre-trained
Fine-tuned
Prompt: a green and yellow toy with a red nose
Pre-trained
Fine-tuned
Prompt: a red and white ball with an angry look on its face
Pre-trained
Fine-tuned
hi @patil-suraj @sayakpaul, was wondering if this is something interesting for you to look into ? Feedback is appreciated
cool! gentle pin @patil-suraj
I've experimented with finetuning proper inpainting models before. I strongly urge you to read the LAMA paper (https://arxiv.org/pdf/2109.07161.pdf) and implement their masking strategy (which is what is used by the stable-diffusion-inpainting checkpoint). I used a very simple masking strategy like what you had for a long time and never got satisfactory results with my model until switching to the LAMA masking strategy. Training on simple white square masks will severely degrade the performance of the pretrained SD inpainting model.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
I've experimented with finetuning proper inpainting models before. I strongly urge you to read the LAMA paper (https://arxiv.org/pdf/2109.07161.pdf) and implement their masking strategy (which is what is used by the stable-diffusion-inpainting checkpoint). I used a very simple masking strategy like what you had for a long time and never got satisfactory results with my model until switching to the LAMA masking strategy. Training on simple white square masks will severely degrade the performance of the pretrained SD inpainting model.
@sayakpaul
I thought having the most simple implementation would do. And then the user can decide which masking strategy to use actually. Sure will add that, if that's a deal breaker
@sayakpaul I have adapted masking strategy from LAMA paper on my local branch. I have a question, is it according to guidelines to have a config file properties for the masking separately, like here: https://github.com/advimman/lama/blob/main/configs/training/data/abl-04-256-mh-dist-celeba.yaml#L10 ?
I feel it is a bit extensive and confusing to make all of those property values as part of CLI arguments, might clutter and confuse - which arguments are model specific and which ones are data specific.
I feel it is a bit extensive and confusing to make all of those property values as part of CLI arguments, might clutter and confuse - which arguments are model specific and which ones are data specific.
You are absolutely correct. What we can do is include a note about the masking strategy in the README and link to your implementation. Does that sound good?
I think we also need to add a test case here.
https://github.com/cryptexis/diffusers/blob/sd_15_inpainting/examples/inpainting/train_inpainting.py#L771 - in my repo I do not have anything similar to it under those lines. And the piece of code you're referring to is here.
I think we also need to add a test case here.
I see a lot of https://huggingface.co/hf-internal-testing is used in the testing. Are usual mortals able to add unit tests ?
Examples Training with Random Masking
Inference with Square Mask (as before)
Prompt: a drawing of a green pokemon with red eyes
pre-trained stable-diffusion-inpainting
fine-tuned stable-diffusion-inpainting
pre-trained stable-diffusion-v1-5
fine-tuned stable-diffusion-v1-5 (no inpainting)
fine-tuned stable-diffusion-v1-5 (inpainting)
Inference with Random Mask
pre-trained stable-diffusion-inpainting
fine-tuned stable-diffusion-inpainting
pre-trained stable-diffusion-v1-5
fine-tuned stable-diffusion-v1-5 (no inpainting)
fine-tuned stable-diffusion-v1-5 (inpainting)
@cryptexis Thank you for providing the scripts and test cases. I want to train a inpainting model specifically for object removal based on the sd1.5-inpainting model, The goal of this model is to be able to remove objects without using a prompt, just like the ldm-inpainting model. Although the sd1.5-inpainting model can achieve decent results with the appropriate prompts (https://github.com/huggingface/diffusers/issues/973), it is often not easy to find the appropriate prompts, and it's easy to add extra objects.
Here's my plan right now:
- I will not modify the
StableDiffusionInpaintPipelinecode, all prompts used during training are blank strings - The mask generation strategy will use methods from CM-GAN-Inpainting which is better than LaMA for inpainting. First use a segmentation model to process the images to obtain object masks. Then, randomly generated masks will never completely cover an object (for example, using 50% IOU as a threshold).
The generated mask looks like this:
I have not trained diffusion models before, any suggestions would be very helpful to me, thank you.
Looking good. I think the only that is pending now is the testing suite.
@sayakpaul worked yesterday on the tests. Hit a wall. Then tried to run tests for the text_to_image and hit the same wall:
attaching the screenshot:
Was wondering if it is a systematic issue across all tests....
@sayakpaul worked yesterday on the tests. Hit a wall. Then tried to run tests for the text_to_image and hit the same wall:
Had it been the case, it would have been caught in the CI. The CI doesn't indicate so. Feel free to push the tests and then we can work towards fixing them. WDYT?
BTW, for fixing the code quality issues, we need to run make style && make quality from the root of diffusers.
@sayakpaul worked yesterday on the tests. Hit a wall. Then tried to run tests for the text_to_image and hit the same wall:
Had it been the case, it would have been caught in the CI. The CI doesn't indicate so. Feel free to push the tests and then we can work towards fixing them. WDYT?
BTW, for fixing the code quality issues, we need to run
make style && make qualityfrom the root ofdiffusers.
Done @sayakpaul , I think everything is addressed, tests are pushed. Thanks a lot for the patience, support and all the help!
How to prepare dataset?
image mask prompt
@cryptexis let's fix the example tests that are failing now.
can anyone share script of sdxl inpainting fine tuning?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
When is this getting merged?
@cryptexis can you
- address the final comments here https://github.com/huggingface/diffusers/pull/6922#discussion_r1519390094 - if peft is not used we can remove it; otherwise we are all good
- make sure the tests pass
will merge once the tests pass!
@Sanster Thanks for your plan, I also want to finetune an stable difffusion inpainting model for object removal. Have you tried this, how is the performance?