diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Fast generation of smaller images with the option to generate the large 512x512 version later

Open Marcophono2 opened this issue 3 years ago • 4 comments

Hello!

I am afraid that there could be two technical reasons that my request could not be solved. But may be I am wrong. As everybody knows not every generated image is really good. And the generation time for a 512x512px image is rather long. (3 seconds on a A100, okay, but who have one..) So I think it would be senseful to generate a few smaller images first, i.E. 128x128px. They are large enough to show if they are worth to let the big version of it been generated or not. I tried this already but there are two problems.

  1. Smaller images are not significant faster generated. Everything below 512x512 saves about 10% of time. (in my tests). Everything above 512x512 let increase the rendering time very fast. (may be exponentially with the square, not sure) Probably it has to do with the training dataset which was basing on 512x512px images. But may be a little adjustement/addon in the prediction code could solve this.

  2. If I use the same seed with which a small image was generated for the 512x512 image, it is totally different. Sure, that effect is known. But I hoped that this would not affect two size versions of a square image. But obviously it does too. May be there is a way to calculate the seed for a larger image from the seed of the smaller one? May be 256x256 -> 512x512 gives a chance calculate binding pixels of the initiate noise or something like that? Okay, in this case the opposite of pixel binding. To bind the pixels from the 512x512 image would assume that for every small image generation the seeds for all larger forms must be generated first. Probably a little overkill.

Best regards Marc

P.S.: If I can add a question here, not totally off topic,just a bit: Is there a way to see/save the initial latent noise image? I think, if I grab the image generated after only one step (of 50 per default), this isn't the initial noise anymore. Surely it should be possible go the way back through the weights which the input parameters (pixel values) passed exactly one time until finishing step 1. But who knows how to do that would much simpler be able to give access to the initial noise image before the processing starts. 🤪 (not so easy to find a maching emoji here...)

Marcophono2 avatar Oct 04 '22 19:10 Marcophono2

Hey @Marcophono2,

Sorry I don't fully understand the issue here. Is the issue about generating 128x128 images faster?

patrickvonplaten avatar Oct 04 '22 22:10 patrickvonplaten

Sorry @patrickvonplaten , obviously I should have taken a bit more time to form my text. Yes, one question is if it is possible to generate 128x128 pixel images significantly faster.

The second question was if it is possible in any way to use the seed of a 128x128 image to make a 512x512 image with the same motive. Or in other words: The higher quality image from the 128x128 pixel image.

Best regards Marc

Marcophono2 avatar Oct 04 '22 23:10 Marcophono2

Thanks for clarifying.

We have multiple efforts to speed up the generation as summarized here: https://huggingface.co/docs/diffusers/main/en/optimization/fp16#memory-and-speed I cannot think of a way that allows us to especially speed up the 128x128 compared to 512x512.

Regarding the 2nd question, I sadly don't think that's possible as the random seed creates the initial latents and if the size of these latents change I don't think torch's random function can in any way create "compressed" random matrices that are very similar in distribution to the 512x512 one .... a random idea would be to just create a 512x512 random matrix, then compress it down to a random 128x128 matrix yourself (e.g. by mean pooling) and then give it a try if outputs match more or less - but that's just a thought :-)

patrickvonplaten avatar Oct 05 '22 10:10 patrickvonplaten

Thank you very much @patrickvonplaten ! Not good to read this but good that my estimation was correct. ;)

Best regards Marc

Marcophono2 avatar Oct 05 '22 10:10 Marcophono2

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Nov 04 '22 15:11 github-actions[bot]