Parallel-Wavenet
Parallel-Wavenet copied to clipboard
The IAF?
did you consider the IAF(Inverse Autoregressive Flow)? the paper said the student use the iaf to generate wave in a parallelized way.
Yes, I think it is IAF now.
@kensun0 ,can you explain more details?It seems no mu_t and scale_t as output of the original wavenet.What's the z nosie looks like ,i think it works like a autoencoder(in autogressive way),so the z just sample from logistic(0,1),and have the same shape like input X and encoding?thank you very much
original wavenet output 256 softmax scores as classification. The parallel paper said that "Since training a 65,536-way categorical distribution would be prohibitively costly, we instead modelled the samples with the discretized mixture of logistics distribution introduced in [23]." So, mu_t and scale_t are from [23].
If we get z in autogressive way, we can't generate wave in parallel way. Right? I think z and x have the same shape, before x+enc,we must upsampling encoding to the shape of x.
@kensun0 oh,i see ,there will out put 3 parameters mixture of logistics distribution :pi_t,mu_t,scale_t [pixcelcnn++]? i still confused about how to generate wave:sampe z noise and it will generate wave parallel,what's the output shape,will you share your code ?I can't wait to see the details.
Yes, if we use one mixture, we can remove pi_t. Sorry, I won't share my codes. The shape of output is as the same as z.
@kensun0 very nice of you, thank you!
@zhf459 I think is that when you use logistic mixture model,at the first flow end you sample wave like result as the input of the next input ,and so on until the last flow you will get better sample wave,but when you use categorical distribution we just need one flow at the end just make the loss between the teacher and the student drop? i donot know if i understand it right.The IAF source code of open ai seems difficult for me to understand,Will we have to use all of the souce code of the original IAF,there are too much codes?maye we can work together to complete it
@jiqizaisikao yes ,please email me [email protected]
@kensun0 hi, since the paper said the student wavenet don't have skip connection layer ,so what's the last layer, and there 4 iaf layers with size=[10,10,10,30]
, each iaf layer is a simplified wavenet?
the last layer output the parameters of logistic distribution, its shape is [wav length, channels]. if you use one mixture, channels=2, there are mu_tot and scale_tot. yes, each iaf is a wavenet.
@kensun0 ,I use the original last layer with one mixture output in student while 10-mixture logistic in teacher,is that ok? how's your final result,can you upload some samples?
That is OK. I also do that.
Ok,i will try again
@kensun0 hi, how do you calculate the power loss? I use the following code but get very large loss, how can i fix this:
def get_power_loss(sample_, x_):
batch = sample_.shape[0]
s = 0
for i in range(batch):
ss = np.abs(librosa.stft(sample_[i][0])) ** 2 - np.abs(librosa.stft(x_[i][0])) ** 2
s += np.sum(ss ** 2)
return s / batch
@zhf459 i have test power_loss and it works right,but i do not know how to complete the crossentropy loss,have you try it?
@jiqizaisikao what do you mean by it works right, did it works in pw? I try some ways to calculate the kl loss ,but I have no idea weather it work or not.
wav = tf.contrib.signal.stft(wav,512,256,fft_length=512) wav = tf.real(wav*tf.conj(wav)) # wav = tf.log(wav) diff = sample - wav loss_power = tf.reduce_mean(tf.reduce_mean(tf.square(diff),0)) # loss_power = tf.log(loss_power)
@zhf459 maybe, you can publish your code, i will check or follow it.
@zhf459 https://github.com/locuslab/pytorch_fft
@kensun0 yes, please help me to make it work! thank you~ check this https://github.com/zhf459/P_wavenet_vocoder
@zhf459 I am so sorry that i have no time to read pytorch's code. :-( If you follow google's implement, https://github.com/tensorflow/magenta/tree/master/magenta/models/nsynth , i can follow you easily.
have u got any quality wav?My result now is not ideal.
yes, i got normal wav, but is worse than original wavenet
my result is also normal, but worse than world...2333
@kensun0 , could you share some of your examples?
And, is the repo in your github the final code of your parallel wavenet?
I'm not quite understand how to compute the H(Ps) & H(Ps, Pt). How is the expectation could be computed by Monte Carlo Sampling?
I am not sure if your pseudo code for the student network is correct:
for f in flow: new_z = shiftright(z) for i in layers-1: new_z_i = H_i(new_z_i,θs_i) new_z_i += new_enc mu_s_f, scale_s_f = H_i(new_z_i,θs_i) #last layer mu_tot = mu_s_f + mu_tot*scale_s_f scale_tot = scale_tot*scale_s_f z = z*scale_s_f + mu_s_f
I think new_z = shiftright(z) is not necessary.
https://github.com/bfs18/nsynth_wavenet I implement a minimum demo code for parallel wavenet based on nsynth. Not finish tuning yet.
@bfs18 do you get any good samples?
@weixsong Sorry, i can not do this, i uesd commercial datasets.