stable-diffusion icon indicating copy to clipboard operation
stable-diffusion copied to clipboard

Checkpoint trained on only 256x256 data?

Open carlini opened this issue 3 years ago • 9 comments

The README says the v1.1 checkpoint was trained on 256x256 images and then fine-tuned on 512x512 images. Is there any way we can access this 256x256 model as a 1.0 checkpoint? There are various purposes where having a lower-resolution model is and would be more useful. For example, if I want to denoise Imagenet images, then the 256x256 model better matches the size of ImageNet and so might perform better than the 512x512 model.

carlini avatar Aug 22 '22 23:08 carlini

I have the same request. Thanks

Yuheng-Li avatar Aug 26 '22 14:08 Yuheng-Li

The README says the v1.1 checkpoint was trained on 256x256 images and then fine-tuned on 512x512 images. Is there any way we can access this 256x256 model as a 1.0 checkpoint? There are various purposes where having a lower-resolution model is and would be more useful. For example, if I want to denoise Imagenet images, then the 256x256 model better matches the size of ImageNet and so might perform better than the 512x512 model.

why would you want a shitter model?

breadbrowser avatar Aug 27 '22 01:08 breadbrowser

I have one specific usecase in mind where 256x256 is, in fact, not "shittier": diffusion models can make great denoisers to improve certified adversarial robustness, as long as the noise matches the image size (https://arxiv.org/abs/2206.10550). So the fact that 256x256 is closer to 224x224 makes this model much better.

I suspect it might also be useful for other purposes as well---and if you don't think this model would be useful for you, then just don't use it.

carlini avatar Aug 27 '22 01:08 carlini

@carlini, I found this https://huggingface.co/justinpinkney/miniSD.

mikeogezi avatar Mar 03 '23 00:03 mikeogezi

I found one at here: https://huggingface.co/lambdalabs/sd-image-variations-diffusers, just change the unet with this model and it will work

pmzzs avatar Apr 07 '23 20:04 pmzzs

I would be also interested in the checkpoint of the model trained only on 256x256 data. It would be nice if you can provide it!

ksai2324 avatar Jun 22 '23 14:06 ksai2324

https://huggingface.co/lambdalabs/miniSD-diffusers

mikeogezi avatar Jun 22 '23 16:06 mikeogezi

@carlini Thanks for the important information. I want reproduce the stable diffusion model. First stage is to train the model on 256x256 data then fine-tune on 512x512 images. Do these two steps have two different autoencoder? Or they are the same autoencoder which seem to be trained on OpenImage dataset. Thanks.

wtliao avatar Aug 02 '23 09:08 wtliao

@carlini Thanks for the important information. I want reproduce the stable diffusion model. First stage is to train the model on 256x256 data then fine-tune on 512x512 images. Do these two steps have two different autoencoder? Or they are the same autoencoder which seem to be trained on OpenImage dataset. Thanks.

Hi I have the same query

jing-yu-lim avatar Feb 16 '24 03:02 jing-yu-lim