diffusers Can pretrained autoencoder stage be used for any new dataset training?

Can pretrained autoencoder stage be used for any new dataset training?

Open Revist opened this issue 3 years ago • 5 comments

In the latent diffusion paper authors mention that: "A notable advantage of this approach is that we need to train the universal autoencoding stage only once and can therefore reuse it for multiple DM trainings or to explore possibly completely different tasks." Does that mean that we do not have to retrain autoencoder stage for image <-> latent space encoding / decoding when we want to train DM on a new dataset i.e. the autoencoder is general enough? This seems pretty strange to me.

I see that here: https://github.com/huggingface/diffusers/pull/356

vae = AutoencoderKL.from_pretrained( args.pretrained_model_name_or_path, subfolder="vae", use_auth_token=args.use_auth_token )

the vae is just loaded, which would seem that this is the case.

Sep 29 '22 12:09 Revist

cc @patil-suraj here. IMO we don't need to retrain the VAE as it's been trained very well already!

Sep 29 '22 19:09 patrickvonplaten

Yes, that's right. In stable/latent diffusion only the unet model is trained, rest (vae and text encoder) are kept frozen. They both are trained separately. And as Patrick mentioned they are well trained, so it's require to train those when fine-tuning stable diffusion.

Sep 30 '22 09:09 patil-suraj

So you mean the autoencoder will work fine even if I supply some images from different resolution / distribution (e.g. it has not seen images with cats and I supply images with cats)?

Sep 30 '22 10:09 Revist

It might not give best results for all resolutions. And for out of distribution, since it's trained on a huge dataset think it can handle most of the cases.

Sep 30 '22 13:09 patil-suraj

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Oct 29 '22 15:10 github-actions[bot]

diffusers diffusers copied to clipboard

Can pretrained autoencoder stage be used for any new dataset training?

diffusers
diffusers copied to clipboard