diffusers
diffusers copied to clipboard
Can pretrained autoencoder stage be used for any new dataset training?
In the latent diffusion paper authors mention that: "A notable advantage of this approach is that we need to train the universal autoencoding stage only once and can therefore reuse it for multiple DM trainings or to explore possibly completely different tasks." Does that mean that we do not have to retrain autoencoder stage for image <-> latent space encoding / decoding when we want to train DM on a new dataset i.e. the autoencoder is general enough? This seems pretty strange to me.
I see that here: https://github.com/huggingface/diffusers/pull/356
vae = AutoencoderKL.from_pretrained( args.pretrained_model_name_or_path, subfolder="vae", use_auth_token=args.use_auth_token )
the vae is just loaded, which would seem that this is the case.
cc @patil-suraj here. IMO we don't need to retrain the VAE as it's been trained very well already!
Yes, that's right. In stable/latent diffusion only the unet model is trained, rest (vae and text encoder) are kept frozen. They both are trained separately. And as Patrick mentioned they are well trained, so it's require to train those when fine-tuning stable diffusion.
So you mean the autoencoder will work fine even if I supply some images from different resolution / distribution (e.g. it has not seen images with cats and I supply images with cats)?
It might not give best results for all resolutions. And for out of distribution, since it's trained on a huge dataset think it can handle most of the cases.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.