diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

img2img results in noisy image with low num_inference_steps

Open jonasdoevenspeck opened this issue 2 years ago • 4 comments

I'm trying to make a video with the StableDiffusionImg2ImgPipeline but am encountering some unexpected behavior. I generate the first frame by using the StableDiffusionPipeline with relatively many steps (e.g. 100). This frame is then slightly warped (e.g. rotation, translation, zoom). Next I feed the frame to StableDiffusionImg2ImgPipeline to slightly refine the warped image. The translated image will have empty sides now (black pixels) which I expect to be made coherent.

However, when using a small num_inference_steps, the image becomes very noisy. Using more num_inference_steps results in a completely different image. I realize I can use the strength to module this behavior but since it is essentially the same as changing num_inference_steps, it doesn't help.

I realize that inpainting could help but this only works for translate and rotate since for zoom there are no black pixels (mask).

Is there any setting for which this video generation would be smooth? Or is the StableDiffusionImg2ImgPipeline just not suited for this application/task?

jonasdoevenspeck avatar Nov 22 '22 22:11 jonasdoevenspeck

Can you put some inputs and output examples?

pedrogengo avatar Nov 23 '22 21:11 pedrogengo

@pedrogengo yes. I first generate the following image with StableDiffusionPipeline (prompt is something with 'chess') 000001 This image is then slightly zoomed in (5% only) and fed into the StableDiffusionImg2ImgPipeline with the same prompt used to generate the initial image, num_inference_steps = 2 and strength=1. The output is as follows: 000002

jonasdoevenspeck avatar Nov 23 '22 21:11 jonasdoevenspeck

Ok, I got your point. The issue for this behavior is because it is not the same if you set: num_inference_steps = 2 and strength = 1., and num_inference_steps = 10 and strength = 0.2.

If you look at the pipe code you can see they call self.scheduler.set_timesteps before call self.get_timesteps:

https://github.com/huggingface/diffusers/blob/f07a16e09bb5b1cf4fa2306bfa4ea791f24fa968/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L532-L533

If you go to the scheduler code (PNDM in this case), you will see that the value for num_inference_steps changes the result of step_ratio = self.config.num_train_timesteps // self.num_inference_steps https://github.com/huggingface/diffusers/blob/f07a16e09bb5b1cf4fa2306bfa4ea791f24fa968/src/diffusers/schedulers/scheduling_pndm.py#L157

This is why you are seeing this amount of noise. My recommendation in this case is to use a higher num_inference_steps and set a small strength, something like: num_inference_steps = 20 and strength = 0.2.

I hope it helps you in some way :)

pedrogengo avatar Nov 23 '22 22:11 pedrogengo

Can only second @pedrogengo here, the strength parameter is way too strong

patrickvonplaten avatar Nov 29 '22 12:11 patrickvonplaten

Thanks for the tips @pedrogengo and @patrickvonplaten. I tried your suggestion (but with num_inference_steps = 100 and strength = 0.1 and indeed it looks better for the first step now: 000003

However when zooming in further, the image becomes blurry again: 000009

I tried with many different combinations of num_inference_steps and strength but I can't manage to get a good result. My goal is to make a video with smooth transitions between frames and with:

  1. zoom, translation, rotation between frames.
  2. new features emerging in the image

To give you an idea I made such a video already with VQGAN-CLIP: https://www.youtube.com/watch?v=T_bii9VLDk0 I was hoping to get better looking visuals with Stable-Diffusion. However, is it possible that the way stable diffusion works, it is not possible to make such videos?

jonasdoevenspeck avatar Nov 30 '22 20:11 jonasdoevenspeck

@patrickvonplaten and @pedrogengo are you aware of any examples (code and video) that succesfully have used the img2img-pipeline to generate videos?

jonasdoevenspeck avatar Dec 24 '22 09:12 jonasdoevenspeck

That's a good question! @jonasdoevenspeck,

I'm sure someone in our discord would know :-) Would you like to join the discord: https://discord.gg/G7tWnz98XR and maybe ask there under "discussions" for diffusion models? I'm sure someone has some good leads.

patrickvonplaten avatar Jan 03 '23 11:01 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jan 27 '23 15:01 github-actions[bot]