wavenet_vocoder icon indicating copy to clipboard operation
wavenet_vocoder copied to clipboard

Different Audio quality among intermediate results

Open auspicious3000 opened this issue 5 years ago • 2 comments

While training the vocoder, it writes to the following three folders "audio", "dev_eval", and "train_no_dev_eval". The audios inside "dev_eval" and "train_no_dev_eval" all sounds very good. However, the audios inside "audio" all have audible hissing background noise. What's the difference between those three folders? Is it possible that the audios written to "audios" folder were somehow generated differently from the ones inside the other two folders? Thanks!

auspicious3000 avatar Apr 14 '20 04:04 auspicious3000

In short: the difference is using teacher-forcing generation or not.

  • dev_eval: Results for development (validation) set. All waveform is generated by autoregressive generation (i.e. inference mode).
  • train_no_dev_eval: Results for training set. All waveform is generated by autoregressive generation (i.e. inference mode).
  • audio: Results for training set. All waveform is generated by teacher-forcing generation (i.e. training mode).

r9y9 avatar Apr 14 '20 04:04 r9y9

Does it make sense that the waveform generated under training mode sounds worse than that generated under inference mode? I mean, under training mode, since the model has access to the previous ground truth, the output should sound at least as good as the output using inference mode. What's your opinion on this? Thanks!

auspicious3000 avatar Apr 14 '20 04:04 auspicious3000