Using with Tacotron2
Hello,
I would like to connect a Tacotron2 model to LPCNet. Is there a way to convert the 80-mel coefficients (output of Taco2) into the 18 Bark scale + 2 pitch parameters (input of LPCNet) ?
And somehow related, when reading about the Bark scale like here on wikipedia, there is usually 24 coefficients, and I don't understand how they are only 18 computed here. Even taking into account the 16kHz sampling, that would leave 22 of them, right ?
Thanks a lot :)
I am using Tacotron2 to predict 20 dim features for LPCNet. But there is noize in the synthesized audio.
我正在使用Tacotron2来预测LPCNet的20个暗淡特征。但合成音频中存在噪音。
Is there any way to improve the sound quality?
@superhg2012 I get the same problem, did you solve it?
I've tried with current master of tacotron2 and LPCTron but failed.
With an adaption of my fork using the correct hparams I'm generating high quality speech audios.zip
My fork with spanish branch + MlWoo adaption of LPCNet, you need to change your path and symbols, see the commit history: https://github.com/carlfm01/Tacotron-2/tree/spanish
@carlfm in your fork, could you let me know how to generate wav from f32 feature? and is it as same speed as original LPCNet?
how to generate wav from f32 feature? and is it as same speed as original LPCNet?
The tacotron repo is to predict the feature not the wav, to generate the wav with the predicted feature by tacotron, you need to use https://github.com/mlwoo/LPCNet fork
And for me, using sparsity of 200 is 3x faster than real time with AVX enabled
@carlfm01 I tried https://github.com/mlwoo/LPCNet fork already, but it generates wav too much noise, as I described in https://github.com/MlWoo/LPCNet/issues/6. How did you solve this problem? any suggestions please?
Noise using predicted features by tacotron or using the real features?
@carlfm01 using the real features. so I converted real wav -> (by ./dump_data) s16 -> (./test_lpcnet) f32 -> (by ffmpeg) wav, as explained in MlWoo's repo. It is supposed to convert the f32 back to original wav, but noise is severe (it contains original voice though). Have you experienced this? When you used MlWoo, were speed and audio quality both perfect? If yes, What did you modify from MlWoo's code? Thank you so much for help.
were speed and audio quality both perfect
Yes.
What did you modify from MlWoo's code?
Nothing.
My only guess is that may you made a mistake compiling your exported weights?
https://github.com/mozilla/LPCNet/issues/58#issuecomment-533470433
Using MlWoo's fork: feature.zip
@carlfm01 Thanks. Let me explain what i did so far in detail.
so now, I have to repositories : LPCNet (original LPCNet repo), LPCNet_MlWoo.
I trained LPCNet and got the nnet_data_* files in LPCNet/src directory. And I moved all of them to LPCNet_MlWoo/src, because when I tried './dump_lpcnet.py lpcnet15_384_10_G16_64.h5' (in LPCNet_MlWoo repo), it didn't work (because of some weird model shape error.). (lpcnet15_384_10_G16_64.h5 model was generated in original LPCNet repo)
and I ran i just ran 'make dump_data taco=1' and 'make test_lcpnet taco=1' .
Do you think these make sense? (I didn't change any parameter of LPCNet and LPCNet_MlWoo)
model was generated in original LPCNet repo
Thats the issue, I'm afraid you need to retrain using MlWoo fork, I did not trained with LPCNet(this repo)
@carlfm01 but Is there any difference between MLWoo's LPCNet training code and original LPCNet's LPCNet training code? aren't they exactly same?
@carlfm01 so you did everything (such as train LPCNet and inference the audio and etc) in MLWoo's repo, right? Which hyperparameters/options did you change?
but Is there any difference between MLWoo's LPCNet training code and original LPCNet's LPCNet training code? aren't they exactly same?
No, otherwise you will be able to load models from both. I also tried and throw an error about a missing layer or an extra layer, I can't recall. The inference code is also different.
so you did everything (such as train LPCNet and inference the audio and etc) in MLWoo's repo, right?
Yes, default.
The only thing that I changed was the training code to load checkpoints and adapt on new data.
This is missing on LPCNet_MlWoo
https://github.com/mozilla/LPCNet/blob/master/src/train_lpcnet.py#L106-L125
@carlfm01 okay, thank you so much. I will try. and you are insisting that when merging Tacotron2 + LPCNet, I better use your spanish fork for tacotron2 right?
@carlfm01 okay, thank you so much. I will try. and you are insisting that when merging Tacotron2 + LPCNet, I better use your spanish fork for tacotron2 right?
Yes, just change your paths and symbols, see the commit history to understand better. I've tried LPCTron and the tacotron master but both failed generating noisy speech.
@carlfm01 thank you so much 🙏. Wish you all the best. i will text u again when i get other questions
And share your results! 👍
@carlfm01 Hi, I followed all your instructions (re-train from MlWoo's repo) and now I've trained 6 epochs for test. the original wav is about 3 seconds long, but generated audio is about 8 seconds long. Have you experienced this problem?
Hello, no, I'm getting the same duration. Is it from real features?
@carlfm01 yes real features. Also I did './test_lpcnet ~.h5' well. This issue is strange.... I'll take a look more. thanks !
@carlfm01 Are sample rate, precision, sample encoding of your training wav files 16000, 16bit, 16-bit singed integer pcm?
Please make sure using make test_lpcnet taco=1 if you extracted the features with taco enabled on the ./dump_data, or disable taco for both
Yes, 16000, 16bit, mono
@carlfm01 I just ran both 'make dump_data taco=1' and 'make test_lpcnet taco=1', so they are both up-to-date.
What about quality? You get the same result cleaning and testing without taco? please also make sure you do make clean .
@carlfm01 If i want to do them without taco, should I do 'make dump_data' and 'make test_lpcnet' instead of 'make dump_data taco=1' and 'make test_lpcnet taco=1' ?
and yes, I think I did make clean
If i want to do them without taco, should I do 'make dump_data' and 'make test_lpcnet' instead of 'make dump_data taco=1' and 'make test_lpcnet taco=1' ?
Yes.
@carlfm01 It works now. incredible. The problem was that I didn't do make clean at very first step. Generated audio samples are extremely clean and inference speed is much faster than realtime. I will upload test results in few minutes here. Only suspicious thing is that this works perfectly even with 6 epochs training .... Thank you so much