FFTNet
FFTNet copied to clipboard
Did you use this repo to train a vocoder?
@fatchord Hi, happy to see you again! I'm also working on the FFTNet. But in my experiments, I cannot get the similar results of the paper's demo page, mainly about conditional sampling and post-denoising. Do you try to reconstruct their results? Thanks.
@syang1993 Hi, how's it going? Yeah, I'm having similar problems - here's what my conditioned model sounds like after 300k steps: (used 80-band mel-spectrograms)
I haven't implemented the noise reduction, is that algorithm publicly available? I had a quick look around and couldn't find it.
As for conditional sampling - I was going to implement a simple threshold or perhaps an exponential moving average from the summed values in the conditioning frames - and use that to differentiate between a voiced/unvoiced state. But haven't got around to it yet so perhaps that's why it doesn't sound so good.
I'm curious what your implementation sounds like - any chance you could post a sample?
@fatchord I also used the 80-band mel-spectrogram to train my model. Since the author cited a book for noise reduction, I don't know what specific method they use, maybe the wiener filtering?
Since I'm on a summer vocation, I can't send you my samples. But you can listen the generated-model.ckpt-200000.ema.pt.wav in https://github.com/syang1993/FFTNet/issues/2 , my results are the same like that (without condition sampling and noise-reduction). It contains strident audio at some positions. When I tried to use random sampling rather argmax, the generated speech will get noisy.
@fatchord this is not bad at all, although I know the goal is to replicate the paper quality results.