latent-diffusion icon indicating copy to clipboard operation
latent-diffusion copied to clipboard

Text + partial image prompting

Open cachett-ML opened this issue 2 years ago • 3 comments

Hi !

In Dall-E, we can provide a partial image in addition to the text description so that the model only completes the image. See:

Capture

Can we do the same with your models? That would be awesome. I tried to modify the LAION-400M model notebook but without much success.

cachett-ML avatar Apr 07 '22 16:04 cachett-ML

Good question! I am also being curious whether it is possible with the provided source code.

hyungkwonko avatar Apr 15 '22 14:04 hyungkwonko

@hyungkwonko Yes it sure does, I already implemented and tested on my own, need some coding

Just pass the x0 and mask into the DDIM sampler, and you can do inpainting while using text prompts

@cachett-ML for your case, mask the image (eg. only the top half), then do the inpainting with text prompts, however, you can do the same, but I have no clue if it would work... IMO, a more decent approch may consider conditioning on visible areas during training, rather than utilizing this text conditioned txt2img model to do the inpainting task, but this model is so far the best we can get our hands on : )

python scripts/txt2img.py --prompt "a cat is wearing a green hat" --image_prompt test_inputs/images/images.jpeg --mask_prompt test_inputs/masks/images.png --ddim_eta 0.0 --n_samples 8 --n_iter 4 --scale 5.0  --ddim_steps 50

images_inpainting

Hope it helps, cheers

For the complete code of txt2img.py, i made a pull request below, see

https://github.com/CompVis/latent-diffusion/pull/57

lxj616 avatar Apr 19 '22 13:04 lxj616

Hey @lxj616, That looks pretty awesome! Thanks for your kind reply & great work. Will try and share my own one.

hyungkwonko avatar Apr 19 '22 14:04 hyungkwonko