LPCNet icon indicating copy to clipboard operation
LPCNet copied to clipboard

Integrating LPCNET with Fastspeech

Open alokprasad opened this issue 4 years ago • 7 comments

@MlWoo have you tried Fastspeech for Mel generation , its astonishing fast for generating Mel spectrograms, combined with LPCNET vocoder it could be work as realtime voice synthesis.

alokprasad avatar Mar 11 '20 01:03 alokprasad

@MlWoo i did some changes in fastspeech for integrating with lpcnet here are my changes

1.First prepossessed audio (ljspeech) and converted it to pcm(s16)

mkdir -p dataset/LJSpeech-1.1/pcms
for i in dataset/LJSpeech-1.1/wavs/*.wav
#sample rate 16khz for lpcnet or 22050?
do sox $i -r 16000 -c 1 -t sw - > dataset/LJSpeech-1.1/pcms/$(basename "$i" | cut -d. -f1).s16
done
  1. Then use below diff for fastspeech to train the network using 20 mels

https://github.com/alokprasad/binaries/blob/master/fast_speech_lpcnet.diff

alokprasad avatar Mar 13 '20 02:03 alokprasad

@alokprasad thank you for watching the status of my repo. I am sorry that I would not have time to put effort on TTS. I think Tacotron2 is good enough and fast enough both on GPU(13x real time on 1080ti) and CPU. And it could acheieve larger throughput than fastspeech.

MlWoo avatar Mar 24 '20 16:03 MlWoo

@MlWoo I did some inference time for Fastspeech its actually faster than Tacotron2 on CPU. eg. for 12 sec audio mel generation is taking about 1.2sec on Single Core CPU.

alokprasad avatar Mar 26 '20 03:03 alokprasad

@alokprasad great job! hope you can share the work with us. It really fast.

MlWoo avatar Mar 26 '20 05:03 MlWoo

@MlWoo I have right now integrated fastspeech and squeewave https://github.com/alokprasad/fastspeech_squeezewave

alokprasad avatar Mar 26 '20 14:03 alokprasad

@alokprasad thank you. I will read it later.

MlWoo avatar Mar 26 '20 16:03 MlWoo

@MlWoo i know you are not working on this , but just wanted too see if you faced any issue similar to below while integrating tacotron2 and lpcnet

, name: GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1)
Traceback (most recent call last):
  File "test_lpcnet.py", line 83, in <module>
    cfeat = enc.predict([features[c:c+1, :, :nb_used_features], periods[c:c+1, :, :]])
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1441, in predict
    x, _, _ = self._standardize_user_data(x)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 579, in _standardize_user_data
    exception_prefix='input')
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py", line 145, in standardize_input_data
    str(data_shape))
ValueError: Error when checking input: expected input_3 to have shape (None, 38) but got array with shape (992, 20)

alokprasad avatar Mar 31 '20 05:03 alokprasad