diffusers
diffusers copied to clipboard
img2img results in noisy image with low num_inference_steps
I'm trying to make a video with the StableDiffusionImg2ImgPipeline
but am encountering some unexpected behavior.
I generate the first frame by using the StableDiffusionPipeline
with relatively many steps (e.g. 100). This frame is then slightly warped (e.g. rotation, translation, zoom). Next I feed the frame to StableDiffusionImg2ImgPipeline
to slightly refine the warped image. The translated image will have empty sides now (black pixels) which I expect to be made coherent.
However, when using a small num_inference_steps
, the image becomes very noisy. Using more num_inference_steps
results in a completely different image. I realize I can use the strength
to module this behavior but since it is essentially the same as changing num_inference_steps
, it doesn't help.
I realize that inpainting could help but this only works for translate and rotate since for zoom there are no black pixels (mask).
Is there any setting for which this video generation would be smooth? Or is the StableDiffusionImg2ImgPipeline
just not suited for this application/task?
Can you put some inputs and output examples?
@pedrogengo yes.
I first generate the following image with StableDiffusionPipeline
(prompt is something with 'chess')
This image is then slightly zoomed in (5% only) and fed into the
StableDiffusionImg2ImgPipeline
with the same prompt used to generate the initial image, num_inference_steps = 2
and strength=1
.
The output is as follows:
Ok, I got your point. The issue for this behavior is because it is not the same if you set: num_inference_steps = 2
and strength = 1.
, and num_inference_steps = 10
and strength = 0.2
.
If you look at the pipe code you can see they call self.scheduler.set_timesteps
before call self.get_timesteps
:
https://github.com/huggingface/diffusers/blob/f07a16e09bb5b1cf4fa2306bfa4ea791f24fa968/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L532-L533
If you go to the scheduler code (PNDM in this case), you will see that the value for num_inference_steps
changes the result of step_ratio = self.config.num_train_timesteps // self.num_inference_steps
https://github.com/huggingface/diffusers/blob/f07a16e09bb5b1cf4fa2306bfa4ea791f24fa968/src/diffusers/schedulers/scheduling_pndm.py#L157
This is why you are seeing this amount of noise. My recommendation in this case is to use a higher num_inference_steps
and set a small strength
, something like: num_inference_steps = 20
and strength = 0.2
.
I hope it helps you in some way :)
Can only second @pedrogengo here, the strength
parameter is way too strong
Thanks for the tips @pedrogengo and @patrickvonplaten. I tried your suggestion (but with num_inference_steps = 100
and strength = 0.1
and indeed it looks better for the first step now:
However when zooming in further, the image becomes blurry again:
I tried with many different combinations of num_inference_steps
and strength
but I can't manage to get a good result. My goal is to make a video with smooth transitions between frames and with:
- zoom, translation, rotation between frames.
- new features emerging in the image
To give you an idea I made such a video already with VQGAN-CLIP: https://www.youtube.com/watch?v=T_bii9VLDk0 I was hoping to get better looking visuals with Stable-Diffusion. However, is it possible that the way stable diffusion works, it is not possible to make such videos?
@patrickvonplaten and @pedrogengo are you aware of any examples (code and video) that succesfully have used the img2img-pipeline to generate videos?
That's a good question! @jonasdoevenspeck,
I'm sure someone in our discord would know :-) Would you like to join the discord: https://discord.gg/G7tWnz98XR and maybe ask there under "discussions" for diffusion models? I'm sure someone has some good leads.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.