diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Best practices / example results and run settings for Dreambooth inpainting

Open aaronlohwy opened this issue 2 years ago • 3 comments

Hi @thedarkzeno @patil-suraj

I'm trying to run the script here, https://github.com/huggingface/diffusers/tree/main/examples/research_projects/dreambooth_inpaint (ty for this)

with similar setup as with training regular text=>image dreambooth. I'm finding that the quality of the generations for inpainting is far inferior to those that I get from pure text => image generation.

Wondering if there are any rules of thumbs that I should be following in order to get better results (e.g. does it require longer training, should I be modifying the masking method during training etc.) Are there any sample runs/results from this pipeline that I can reference?

Thanks!

aaronlohwy avatar Dec 13 '22 09:12 aaronlohwy

Interesting! I don't know really and sadly won't have the time to look into it :sweat_smile: @thedarkzeno maybe?

patrickvonplaten avatar Dec 16 '22 15:12 patrickvonplaten

The training uses random masks, this may cause it to learn a bit slower, for me it worked well with more steps like 500-1000, but it could be be different for your dataset. I'm Thinking about adding some segmentation option to optimize the training, but not sure if I will have time soon.

thedarkzeno avatar Dec 20 '22 14:12 thedarkzeno

Thanks for the clarification - I've been training for up to 1k steps on a concept with ~7 images. I think the main issue is that it doesn't seem to be learning the concept - I tested this by basically providing an all-white mask on top of a blank base, and it was unable to re-generate the concept (though typically dreambooth would be able to overfit very quickly). Are there any results / reference runs or notebooks? I can also play around with hparams but wanted to get an e2e run working first.

As a side note - if there's a general inpainting training script, (similar to diffusers/examples/text_to_image/train_text_to_image.py)? I could also try that if so.

aaronlohwy avatar Dec 20 '22 17:12 aaronlohwy

I had the same issue with the inpainting script (https://github.com/huggingface/diffusers/tree/main/examples/research_projects/dreambooth_inpaint).

It didn't learn the concept on 100-500 training steps, using 10 example images.

This helped for me:

  • training with my own masks;
  • training text encoder as it was suggested in the readme.
  • increasing training steps to 2000

Here are the results for a sweatshirt. 2023-01-08 21 36 32

Still, strange that inpainting takes much more training steps than a dreambooth text2image. @thedarkzeno Do you have any suggestions on why this might be the case?

belonel avatar Jan 08 '23 18:01 belonel

Hey @belonel could you elaborate on the masks that you used? I don't know what masks you used, but I believe the training for inpainting takes longer because of the random masks. I think that using a model like clipseg to generate masks could make it better, but I haven't had time to implement it.

thedarkzeno avatar Jan 09 '23 03:01 thedarkzeno

Thanks, I'll take a look at the clipseg. I used cloth segmentation using u2net to train the model to reproduce clothes. Masks look like this: 2 1

belonel avatar Jan 10 '23 06:01 belonel

@belonel these masks look pretty good, they must be much better for training the model compared to the random ones.

thedarkzeno avatar Jan 10 '23 21:01 thedarkzeno

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Feb 04 '23 15:02 github-actions[bot]

@belonel , Do you have a sample notebook? How can I do that?

kadirnar avatar Mar 04 '23 16:03 kadirnar