icefall I'm planning to implement some SOTA speech synthesis models on LJSpeech and LibriTTS, any suggestions?

I'm planning to implement some SOTA speech synthesis models on LJSpeech and LibriTTS, any suggestions?

Open lifeiteng opened this issue 3 years ago • 4 comments

trafficstars

I'm new in k2, but very familiar with Kaldi(I wrote the kaldi-ctc).

Text Analyzer

In the first stage, it will be simple.
In the future, it will include
- Text Normalization
- Disambiguation of polyphonic words
- etc

Acoustic Models

Vocoder Models

Fully End-To-End Model

Production

A high quality pronunciation dictionary
Text Analyzer
- Text Normalizer( and Segmenter for Chinese）
- Disambiguation of polyphonic words
Model: Text-To-Speech
Serving

In the first stage, I will focus on implementing Parallel Tacotron2 and HiFiGAN(maybe a new variant) on LJSpeech.

Aug 23 '22 08:08 lifeiteng

Would be nice if you can first show us some code/initial implementation about this.

Aug 24 '22 08:08 csukuangfj

Would be nice if you can first show us some code/initial implementation about this.

I will make it happen in the next few weeks.

Aug 24 '22 10:08 lifeiteng

Very cool! You can find data preparation recipes for both LibriTTS and LJSpeech (and many more) in Lhotse.

https://github.com/lhotse-speech/lhotse/tree/master/lhotse/recipes

Aug 24 '22 12:08 pzelasko

The Text-To-Spectrogram model starts to converge, good upsampling attention has been learned at step 400.

LJ040-0031-10464-0

Sep 02 '22 06:09 lifeiteng

Hi, we have added supports for TTS datasets like LJSpeech and VCTK to icefall!

You can look into them at https://github.com/k2-fsa/icefall/tree/master/egs/ljspeech/TTS and https://github.com/k2-fsa/icefall/pull/1380 , thanks!

Best

Dec 06 '23 01:12 JinZr