diffusers
diffusers copied to clipboard
Being able to input noisy init latents to StableDiffusionPipeline
At the moment, two seeds are completely orthogonal to another, i.e. seed=1 and seed=2 have nothing to do with another and are as different as seed=1 and seed=12891371. In Automatic1111, one can choose a subseed, which is basically the amount of morphing it does from the noisy pattern of one seed to another. It would be cool to add this behavior to the StableDiffusionPipeline. This allows selectively changing parts of an image. It also allows for more fluent animations.
If I understand correctly, I think the solution would be quite straight forward. The StableDiffusionPipeline already allows to input init latents. However, in the prepare_latents() method, there is still noise added to those. If we had the option to either input the noise pattern ourselves or to add the latents with the noise overlayed, such that the noise is not added later on, we could do all the subseed adjustments we want. The solution could just entail an additional flag skip_noise and
def prepare_latents(...):
[....]
if not skip_noise:
shape = init_latents.shape
noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
# get latents
init_latents = self.scheduler.add_noise(init_latents, noise, timestep)
latents = init_latents
return latents
or, it is another dimension in the input image, i.e. it would be image.shape[1] == 4 if normal latents without noise added and image.shape[1] == 5 with the additional dimension being the noise pattern.
Just some ideas, not sure if this is on the roadmap or if it's too niche. I do realize that one could also write a custom pipeline for it.
Thanks for all your great work.
i think this is the same request as #7011
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.