nsynth_wavenet
nsynth_wavenet copied to clipboard
Three different points compare to the paper
Hi, thank you for sharing this code and I find some differences comparing to wavenet paper. 1.Why you discard the skip connections in parallel wavenet which is used in wavenet? 2. I find a local condition convoluted by mel between iaf and the last relu layer. What does it mean? 3.Parallel wavenet generate output x by mu-tot and s-tot , contrast to Clarinet, which regard the n-th sample z as output. What do you think about it?
I am not the author of this code, but this is my understanding.
-
Why you discard the skip connections in parallel wavenet which is used in wavenet? "The student network consisted of the same WaveNet architecture layout, except with different inputs and outputs and no skip connections. " Parallel WaveNet: Fast High-Fidelity Speech Synthesis
-
I find a local condition convoluted by mel between iaf and the last relu layer. What does it mean? This is just a different implementation. I don't know whether this is important or not (I think it is not) .
-
Parallel wavenet generate output x by mu-tot and s-tot , contrast to Clarinet, which regard the n-th sample z as output. What do you think about it? My understand of this is that Deepmind parallel Wavenet needs significant amount of sampling to compute KL loss which makes sense that a student sample is sampled from mu_tot and s_tot. Clarinet computes KL loss in a closed-form. Then, outputs of the last IAF flow can be used as student samples.
Thanks for your reply. I have corrected my mistake according to your answer. I read the Clarinet before parallel Wavenet so I don't take notice of the differences. Aha, that is not "严谨治学". However, both two models generate noisy voices, at least worse than the teacher. By STFT, I find that it can not learn high frequency distribution. Any idea to improve the model?