DiffGAN-TTS icon indicating copy to clipboard operation
DiffGAN-TTS copied to clipboard

Is adversarial training actually necessary?

Open nguyenhungquang opened this issue 2 years ago • 3 comments

I realise that when I remove adversarial loss and feature match loss, it still works well and has no degeneration of performance. This makes me question the role of adversarial training in reduction of inference steps, or this this task is simple enough to learn directly with denoise model. Here are samples from two models https://drive.google.com/drive/folders/1uvURiQkOrP9n1jJsKyNe9NcSO4AfdFID?usp=sharing

nguyenhungquang avatar Jul 02 '22 06:07 nguyenhungquang

Hi @nguyenhungquang , thanks for sharing your insight. I also found the same result when I built this repo with the comparison of DiffSinger and DiffGAN-TTS. My conclusion was also that the task from LJSpeech is too easy. In my opinion, the GAN training will serve to be generalized with small steps when the dataset had more expressive and noisy speech.

keonlee9420 avatar Jul 03 '22 15:07 keonlee9420

@keonlee9420 Thank you. I've also trained with my dataset, which is a bit noisy, and it performs well. Though melspec is more clear when I visualise, it's unlikely to get noticed when listen. I think the difference might be more visible for multi-speaker dataset

nguyenhungquang avatar Jul 05 '22 03:07 nguyenhungquang

Good catch. I think it does make sense.

keonlee9420 avatar Jul 31 '22 05:07 keonlee9420