Make-An-Audio icon indicating copy to clipboard operation
Make-An-Audio copied to clipboard

Make-An-Audio 2 1D VAE

Open MoayedHajiAli opened this issue 11 months ago • 3 comments

Hello,

I noticed that in the Make-an-aduio 2 paper, you have not reported the reconstruction loss performance of your trained 1D VAE in comparison with the 2D one. I am wondering if you could get a similar reconstruction performance with the 1D VAE, or the reconstruction performance was inferior yet the generation quality was better overall.

Thank you for your help.

MoayedHajiAli avatar Mar 09 '24 20:03 MoayedHajiAli

The reconstruction performance is a little worse than 2D VAE but it works better when using with diffusion. Training 1D VAE with 2D discriminators is prone to instability problems, leading to overly smooth results. Because 2D patch GAN is very strong. So we use R1 regularization and set r1_reg_weight=3, disc_factor=2 to stabilize the training https://github.com/Text-to-Audio/Make-An-Audio/blob/8d4f84e6db5cb383673de3d63510410bc7deb037/ldm/modules/losses_audio/contperceptual.py#L54C1-L57C43

Darius-H avatar Mar 16 '24 14:03 Darius-H

Thank you very much @Darius-H for your help. I would appreciate it if you can share the reconstruction loss for the 1d VAE if it is available to you. Looking forward for the full training code and training configuration.

MoayedHajiAli avatar Mar 22 '24 19:03 MoayedHajiAli

Thank you very much @Darius-H for your help. I would appreciate it if you can share the reconstruction loss for the 1d VAE if it is available to you. Looking forward for the full training code and training configuration.

Make-An-Audio 2 is released in https://github.com/bytedance/Make-An-Audio-2. The loss figure of 1D VAE: image

Darius-H avatar Jun 06 '24 17:06 Darius-H