hifi-gan icon indicating copy to clipboard operation
hifi-gan copied to clipboard

Pause between sentence

Open chikiuso opened this issue 2 years ago • 3 comments

It is a great work, It did produce really high quality of voice. But It seems perfect for one sentence, when I try to generate multiple sentences, it doesn't have any pause between sentence, may I have your advice on how to let it pause between sentence? thanks.

chikiuso avatar Jun 14 '22 11:06 chikiuso

What do you use as a model to convert input sentences into melspectragrams? I would suggest this exact part of your pipeline fails to add pauses between sentences correctly.

Hifigan is a vocoder - it only converts melspectragrams into audio. Melspectragram should have a pause at a proper place so the vocoder could generate several sentences properly.

Anyway, the first advice here is to visually check the melspectragram if it has a long enough pause between sentences.

evrrn avatar Jun 14 '22 13:06 evrrn

Hi @evrrn , thanks for your reply! is there any way I could add some markup on the text so it could recognise it and pause for let say 0.5 seconds between the sentence? thanks.

chikiuso avatar Jun 14 '22 19:06 chikiuso

As I mentioned, it depends on the model you are using to obtain the melspectrogram. Guess you are trying to use TransformerTTS - so your pauses issue only depends on its implementation. It's not hifigan's problem.

evrrn avatar Jun 14 '22 22:06 evrrn