Parallel-Wavenet icon indicating copy to clipboard operation
Parallel-Wavenet copied to clipboard

The IAF?

Open zhf459 opened this issue 7 years ago • 45 comments

did you consider the IAF(Inverse Autoregressive Flow)? the paper said the student use the iaf to generate wave in a parallelized way.

zhf459 avatar Feb 01 '18 02:02 zhf459

Yes, I think it is IAF now.

kensun0 avatar Feb 02 '18 02:02 kensun0

@kensun0 ,can you explain more details?It seems no mu_t and scale_t as output of the original wavenet.What's the z nosie looks like ,i think it works like a autoencoder(in autogressive way),so the z just sample from logistic(0,1),and have the same shape like input X and encoding?thank you very much

zhf459 avatar Feb 07 '18 09:02 zhf459

original wavenet output 256 softmax scores as classification. The parallel paper said that "Since training a 65,536-way categorical distribution would be prohibitively costly, we instead modelled the samples with the discretized mixture of logistics distribution introduced in [23]." So, mu_t and scale_t are from [23].

kensun0 avatar Feb 07 '18 11:02 kensun0

If we get z in autogressive way, we can't generate wave in parallel way. Right? I think z and x have the same shape, before x+enc,we must upsampling encoding to the shape of x.

kensun0 avatar Feb 07 '18 12:02 kensun0

@kensun0 oh,i see ,there will out put 3 parameters mixture of logistics distribution :pi_t,mu_t,scale_t [pixcelcnn++]? i still confused about how to generate wave:sampe z noise and it will generate wave parallel,what's the output shape,will you share your code ?I can't wait to see the details.

zhf459 avatar Feb 07 '18 13:02 zhf459

Yes, if we use one mixture, we can remove pi_t. Sorry, I won't share my codes. The shape of output is as the same as z.

kensun0 avatar Feb 08 '18 04:02 kensun0

@kensun0 very nice of you, thank you!

zhf459 avatar Feb 08 '18 14:02 zhf459

@zhf459 I think is that when you use logistic mixture model,at the first flow end you sample wave like result as the input of the next input ,and so on until the last flow you will get better sample wave,but when you use categorical distribution we just need one flow at the end just make the loss between the teacher and the student drop? i donot know if i understand it right.The IAF source code of open ai seems difficult for me to understand,Will we have to use all of the souce code of the original IAF,there are too much codes?maye we can work together to complete it

jiqizaisikao avatar Feb 09 '18 12:02 jiqizaisikao

@jiqizaisikao yes ,please email me [email protected]

zhf459 avatar Feb 11 '18 01:02 zhf459

@kensun0 hi, since the paper said the student wavenet don't have skip connection layer ,so what's the last layer, and there 4 iaf layers with size=[10,10,10,30], each iaf layer is a simplified wavenet?

zhf459 avatar Feb 11 '18 08:02 zhf459

the last layer output the parameters of logistic distribution, its shape is [wav length, channels]. if you use one mixture, channels=2, there are mu_tot and scale_tot. yes, each iaf is a wavenet.

kensun0 avatar Feb 12 '18 07:02 kensun0

@kensun0 ,I use the original last layer with one mixture output in student while 10-mixture logistic in teacher,is that ok? how's your final result,can you upload some samples?

zhf459 avatar Feb 12 '18 13:02 zhf459

That is OK. I also do that.

kensun0 avatar Feb 13 '18 06:02 kensun0

Ok,i will try again

jiqizaisikao avatar Feb 25 '18 02:02 jiqizaisikao

@kensun0 hi, how do you calculate the power loss? I use the following code but get very large loss, how can i fix this:

def get_power_loss(sample_, x_):
    batch = sample_.shape[0]
    s = 0
    for i in range(batch):
        ss = np.abs(librosa.stft(sample_[i][0])) ** 2 - np.abs(librosa.stft(x_[i][0])) ** 2
        s += np.sum(ss ** 2)
    return s / batch

zhf459 avatar Mar 06 '18 06:03 zhf459

@zhf459 i have test power_loss and it works right,but i do not know how to complete the crossentropy loss,have you try it?

jiqizaisikao avatar Mar 08 '18 09:03 jiqizaisikao

@jiqizaisikao what do you mean by it works right, did it works in pw? I try some ways to calculate the kl loss ,but I have no idea weather it work or not.

zhf459 avatar Mar 08 '18 09:03 zhf459

wav = tf.contrib.signal.stft(wav,512,256,fft_length=512) wav = tf.real(wav*tf.conj(wav)) # wav = tf.log(wav) diff = sample - wav loss_power = tf.reduce_mean(tf.reduce_mean(tf.square(diff),0)) # loss_power = tf.log(loss_power)

kensun0 avatar Mar 08 '18 10:03 kensun0

@zhf459 maybe, you can publish your code, i will check or follow it.

kensun0 avatar Mar 08 '18 10:03 kensun0

@zhf459 https://github.com/locuslab/pytorch_fft

jiqizaisikao avatar Mar 09 '18 01:03 jiqizaisikao

@kensun0 yes, please help me to make it work! thank you~ check this https://github.com/zhf459/P_wavenet_vocoder

zhf459 avatar Mar 09 '18 07:03 zhf459

@zhf459 I am so sorry that i have no time to read pytorch's code. :-( If you follow google's implement, https://github.com/tensorflow/magenta/tree/master/magenta/models/nsynth , i can follow you easily.

kensun0 avatar Mar 10 '18 07:03 kensun0

have u got any quality wav?My result now is not ideal.

neverjoe avatar Apr 23 '18 06:04 neverjoe

yes, i got normal wav, but is worse than original wavenet

kensun0 avatar Apr 25 '18 09:04 kensun0

my result is also normal, but worse than world...2333

neverjoe avatar Apr 25 '18 13:04 neverjoe

@kensun0 , could you share some of your examples?

And, is the repo in your github the final code of your parallel wavenet?

I'm not quite understand how to compute the H(Ps) & H(Ps, Pt). How is the expectation could be computed by Monte Carlo Sampling?

weixsong avatar May 12 '18 09:05 weixsong

I am not sure if your pseudo code for the student network is correct:

for f in flow:					
		
	    new_z = shiftright(z)
				
	    for i in layers-1:
		
		    new_z_i = H_i(new_z_i,θs_i)
						
		    new_z_i += new_enc
				
	    mu_s_f, scale_s_f = H_i(new_z_i,θs_i)		#last layer
					
	    mu_tot = mu_s_f + mu_tot*scale_s_f
					
	    scale_tot = scale_tot*scale_s_f
		
	    z = z*scale_s_f + mu_s_f 

I think new_z = shiftright(z) is not necessary.

zhang-jian avatar May 24 '18 22:05 zhang-jian

https://github.com/bfs18/nsynth_wavenet I implement a minimum demo code for parallel wavenet based on nsynth. Not finish tuning yet.

bfs18 avatar May 25 '18 08:05 bfs18

@bfs18 do you get any good samples?

zhf459 avatar May 25 '18 10:05 zhf459

@weixsong Sorry, i can not do this, i uesd commercial datasets.

kensun0 avatar May 26 '18 10:05 kensun0