Parallel-Wavenet The IAF?

did you consider the IAF(Inverse Autoregressive Flow)? the paper said the student use the iaf to generate wave in a parallelized way.

Feb 01 '18 02:02 zhf459

Yes, I think it is IAF now.

Feb 02 '18 02:02 kensun0

@kensun0 ,can you explain more details?It seems no mu_t and scale_t as output of the original wavenet.What's the z nosie looks like ,i think it works like a autoencoder(in autogressive way),so the z just sample from logistic(0,1),and have the same shape like input X and encoding?thank you very much

Feb 07 '18 09:02 zhf459

original wavenet output 256 softmax scores as classification. The parallel paper said that "Since training a 65,536-way categorical distribution would be prohibitively costly, we instead modelled the samples with the discretized mixture of logistics distribution introduced in [23]." So, mu_t and scale_t are from [23].

Feb 07 '18 11:02 kensun0

If we get z in autogressive way, we can't generate wave in parallel way. Right? I think z and x have the same shape, before x+enc，we must upsampling encoding to the shape of x.

Feb 07 '18 12:02 kensun0

@kensun0 oh,i see ,there will out put 3 parameters mixture of logistics distribution :pi_t,mu_t,scale_t [pixcelcnn++]? i still confused about how to generate wave:sampe z noise and it will generate wave parallel,what's the output shape,will you share your code ?I can't wait to see the details.

Feb 07 '18 13:02 zhf459

Yes, if we use one mixture, we can remove pi_t. Sorry, I won't share my codes. The shape of output is as the same as z.

Feb 08 '18 04:02 kensun0

@kensun0 very nice of you, thank you!

Feb 08 '18 14:02 zhf459

@zhf459 I think is that when you use logistic mixture model,at the first flow end you sample wave like result as the input of the next input ,and so on until the last flow you will get better sample wave,but when you use categorical distribution we just need one flow at the end just make the loss between the teacher and the student drop? i donot know if i understand it right.The IAF source code of open ai seems difficult for me to understand,Will we have to use all of the souce code of the original IAF，there are too much codes?maye we can work together to complete it

Feb 09 '18 12:02 jiqizaisikao

@jiqizaisikao yes ,please email me [email protected]

Feb 11 '18 01:02 zhf459

@kensun0 hi, since the paper said the student wavenet don't have skip connection layer ,so what's the last layer, and there 4 iaf layers with size=[10,10,10,30], each iaf layer is a simplified wavenet?

Feb 11 '18 08:02 zhf459

the last layer output the parameters of logistic distribution, its shape is [wav length, channels]. if you use one mixture, channels=2, there are mu_tot and scale_tot. yes, each iaf is a wavenet.

Feb 12 '18 07:02 kensun0

@kensun0 ,I use the original last layer with one mixture output in student while 10-mixture logistic in teacher,is that ok? how's your final result,can you upload some samples?

Feb 12 '18 13:02 zhf459

That is OK. I also do that.

Feb 13 '18 06:02 kensun0

Ok,i will try again

Feb 25 '18 02:02 jiqizaisikao

@kensun0 hi, how do you calculate the power loss? I use the following code but get very large loss, how can i fix this:

def get_power_loss(sample_, x_):
    batch = sample_.shape[0]
    s = 0
    for i in range(batch):
        ss = np.abs(librosa.stft(sample_[i][0])) ** 2 - np.abs(librosa.stft(x_[i][0])) ** 2
        s += np.sum(ss ** 2)
    return s / batch

Mar 06 '18 06:03 zhf459

@zhf459 i have test power_loss and it works right,but i do not know how to complete the crossentropy loss,have you try it?

Mar 08 '18 09:03 jiqizaisikao

@jiqizaisikao what do you mean by it works right, did it works in pw? I try some ways to calculate the kl loss ,but I have no idea weather it work or not.

Mar 08 '18 09:03 zhf459

wav = tf.contrib.signal.stft(wav,512,256,fft_length=512) wav = tf.real(wav*tf.conj(wav)) # wav = tf.log(wav) diff = sample - wav loss_power = tf.reduce_mean(tf.reduce_mean(tf.square(diff),0)) # loss_power = tf.log(loss_power)

Mar 08 '18 10:03 kensun0

@zhf459 maybe, you can publish your code, i will check or follow it.

Mar 08 '18 10:03 kensun0

@zhf459 https://github.com/locuslab/pytorch_fft

Mar 09 '18 01:03 jiqizaisikao

@kensun0 yes, please help me to make it work! thank you~ check this https://github.com/zhf459/P_wavenet_vocoder

Mar 09 '18 07:03 zhf459

@zhf459 I am so sorry that i have no time to read pytorch's code. :-( If you follow google's implement, https://github.com/tensorflow/magenta/tree/master/magenta/models/nsynth , i can follow you easily.

Mar 10 '18 07:03 kensun0

have u got any quality wav?My result now is not ideal.

Apr 23 '18 06:04 neverjoe

yes, i got normal wav, but is worse than original wavenet

Apr 25 '18 09:04 kensun0

my result is also normal, but worse than world...2333

Apr 25 '18 13:04 neverjoe

@kensun0 , could you share some of your examples?

And, is the repo in your github the final code of your parallel wavenet?

I'm not quite understand how to compute the H(Ps) & H(Ps, Pt). How is the expectation could be computed by Monte Carlo Sampling?

May 12 '18 09:05 weixsong

I am not sure if your pseudo code for the student network is correct:

for f in flow:					
		
	    new_z = shiftright(z)
				
	    for i in layers-1:
		
		    new_z_i = H_i(new_z_i,θs_i)
						
		    new_z_i += new_enc
				
	    mu_s_f, scale_s_f = H_i(new_z_i,θs_i)		#last layer
					
	    mu_tot = mu_s_f + mu_tot*scale_s_f
					
	    scale_tot = scale_tot*scale_s_f
		
	    z = z*scale_s_f + mu_s_f

I think new_z = shiftright(z) is not necessary.

May 24 '18 22:05 zhang-jian

https://github.com/bfs18/nsynth_wavenet I implement a minimum demo code for parallel wavenet based on nsynth. Not finish tuning yet.

May 25 '18 08:05 bfs18

@bfs18 do you get any good samples?

May 25 '18 10:05 zhf459

@weixsong Sorry, i can not do this, i uesd commercial datasets.

May 26 '18 10:05 kensun0

Parallel-Wavenet Parallel-Wavenet copied to clipboard

The IAF?

Parallel-Wavenet
Parallel-Wavenet copied to clipboard