diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Combined loss term for VQ-VAE (`diffusers.VQModel`)

Open asy51 opened this issue 1 year ago • 2 comments

For training the VQ-VAE component of a latent diffusion model a la CompVis/ldm-celebahq-256 (which uses diffusers.VQModel), is there a combined loss term for each of the losses as described by the authors: reconstruction loss, vq loss, and commitment loss?

I see the vq loss term is collected in VectorQuantizer, but it does not seem to be used anywhere else. https://github.com/huggingface/diffusers/blob/ebc99a77aad647c5d33eb36a33c23f7b3949cb40/src/diffusers/models/autoencoders/vae.py#L726-L730

I'm also open to alternatives to VQModel like AutoEncoderKL, if they can collect the loss terms more easily.

Thank you!

asy51 avatar Apr 26 '24 16:04 asy51

For example, Seq2SeqQuestionAnsweringModelOutput has a loss attribute (https://github.com/huggingface/transformers/blob/9fe3f585bb4ea29f209dc705d269fbe292e1128f/src/transformers/modeling_outputs.py#L1169) which can be used to train transformers.T5... I'm looking for something similar in VQModel, or other VAE for that matter.

asy51 avatar Apr 26 '24 23:04 asy51

yes, in the deep-floyd/IF project we see these; https://github.com/deep-floyd/IF/blob/develop/deepfloyd_if/model/gaussian_diffusion.py#L739 but i can't remember anywhere seeing them in the diffusers project

bghira avatar Apr 27 '24 02:04 bghira

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Sep 14 '24 15:09 github-actions[bot]