tensorflow-wavenet icon indicating copy to clipboard operation
tensorflow-wavenet copied to clipboard

Tips for audio processing

Open Garygunn94 opened this issue 7 years ago • 9 comments

Hi there. Ive used the code from this repo but altered it to form input-output mappings rather than be a generative network as I want to test if its possible to train a wavenet network to do audio processing such as noise removal or apply reverb.

My implementation 'works' in that it does train and results are produced but they are quite noisy. I'm trying to see if training more will reduce this noise however the loss does not seem to go down any further past approx 3.5.

From the results I'm currently getting, I can tell that the wavenet network is learning how to do the tasks I set out for it, just not very well. Would anyone have any thoughts or tips as to how I can alter the architecture or parameters in order to improve my results?

Much Appreciated.

Garygunn94 avatar Apr 17 '17 10:04 Garygunn94

I have used Wavenet on speech data and my results improve when I have played around with different minbatch size (--sample_size) and when i use learning rate that is reduced every 20k steps by say 20%.

vjravi avatar Apr 17 '17 10:04 vjravi

@vijay-ravichandran Thanks for your suggestions. The default sample_size is 100000, how much of an increase or decrease would you recommend?

Garygunn94 avatar Apr 17 '17 10:04 Garygunn94

I have tried with 50k, 25k and 12k. I found 12k with batch_size 2 to be optimal in the VCTK corpus. I would recommend it to be less than half of the average number of samples you have per audio file. But reducing learning rate regularly seemed to contribute more than the sample size.

vjravi avatar Apr 17 '17 10:04 vjravi

@vijay-ravichandran Great I'll try these out. Thanks.

Garygunn94 avatar Apr 17 '17 10:04 Garygunn94

@vijay-ravichandran I'm already seeing significant improvements, why does reducing the learning rate have such a big effect?

Garygunn94 avatar Apr 17 '17 10:04 Garygunn94

You want to start with a higher learning rate to get close to the optimal point quicker. But once you get close, you have to slow down in order to not overshoot the optimal point. Refer http://cs231n.github.io/neural-networks-3/

learningrates

vjravi avatar Apr 17 '17 10:04 vjravi

@Garygunn94 hey man, I'm doing something similar, but have no experience in TF yet. I have dataset of:

  1. background + front audio
  2. front audio (same exact audio just without background)

and I want to train AI to remove background. You have code to share?

Gintasz avatar Sep 14 '17 17:09 Gintasz

@Gintasz Hi there!

I have a project on my github which is essentailly a clone of this one with a few modifications that allow the user to have separate input and output files. https://github.com/Garygunn94/wavenet

There's some audio samples on there already so you can see how it would work. You shouldnt need anything extra that is not in this repo.

Any questions feel free to ask!

Garygunn94 avatar Sep 19 '17 15:09 Garygunn94

You want to start with a higher learning rate to get close to the optimal point quicker. But once you get close, you have to slow down in order to not overshoot the optimal point. Refer http://cs231n.github.io/neural-networks-3/

learningrates

Hi vjravi, how did you modify the code to reduce the learning rate? I assume it is not that difficult, but could you share the code? thanks

zegenerative avatar Dec 09 '18 20:12 zegenerative