DiffGAN-TTS Is adversarial training actually necessary?

Is adversarial training actually necessary?

Open nguyenhungquang opened this issue 2 years ago • 3 comments

I realise that when I remove adversarial loss and feature match loss, it still works well and has no degeneration of performance. This makes me question the role of adversarial training in reduction of inference steps, or this this task is simple enough to learn directly with denoise model. Here are samples from two models https://drive.google.com/drive/folders/1uvURiQkOrP9n1jJsKyNe9NcSO4AfdFID?usp=sharing

Jul 02 '22 06:07 nguyenhungquang

Hi @nguyenhungquang , thanks for sharing your insight. I also found the same result when I built this repo with the comparison of DiffSinger and DiffGAN-TTS. My conclusion was also that the task from LJSpeech is too easy. In my opinion, the GAN training will serve to be generalized with small steps when the dataset had more expressive and noisy speech.

Jul 03 '22 15:07 keonlee9420

@keonlee9420 Thank you. I've also trained with my dataset, which is a bit noisy, and it performs well. Though melspec is more clear when I visualise, it's unlikely to get noticed when listen. I think the difference might be more visible for multi-speaker dataset

Jul 05 '22 03:07 nguyenhungquang

Good catch. I think it does make sense.

Jul 31 '22 05:07 keonlee9420

DiffGAN-TTS DiffGAN-TTS copied to clipboard

Is adversarial training actually necessary?

DiffGAN-TTS
DiffGAN-TTS copied to clipboard