FastSpeech2 Symbols included punctuation

I noticed that punctuation included in text/symbols.py, this means punctuation should be encoded. But

In TextGrid generated by mfa, there are no punctuation. So model will never get punctuation in training.
In synthesize.py, punctuations are all repalced by {sp}

So can you walk me through it?

Aug 13 '21 07:08 huypl53

Hi @phamlehuy53, in textgrid files generated by MFA tool, all punctuations will be modeled by sp.

Aug 14 '21 03:08 leminhnguyen

Hi @phamlehuy53, in textgrid files generated by MFA tool, all punctuations will be modeled by sp.

Of course. So model just learns sp not punctuation ( punctuation in text.symbols though), this is worthless. And in synthesizing, all punctuations are replace by sp, meanwhile some are not actually a sp like ()""@ so on

Aug 15 '21 04:08 huypl53

Because this repo depends on the MFA tool to get the duration of each phoneme, so the way to modeling punctuation will be affected by MFA, so I think you should ask the owner of MFA instead of FS2.

Btw, Tacotron is an example which can model all punctuations like you want to because it learns the alignment with unsupervised manner for each of encoded character. But in my opinion we should model frequent punctuations instead of all possible ones.

Aug 15 '21 04:08 leminhnguyen

I am not sure if I am understand this right, but if we use only {sp} for all punctuations, silence duration of "," and "." will be the same right? I have trained my own FS2 model using MFA tool to generate TG files, and sometime it skip punctuations. Anyone know the cause of this and how to fix it?

Oct 04 '21 02:10 EuphoriaCelestial

I am not sure if I am understand this right, but if we use only {sp} for all punctuations, silence duration of "," and "." will be the same right? I have trained my own FS2 model using MFA tool to generate TG files, and sometime it skip punctuations. Anyone know the cause of this and how to fix it?

For easy operating on my data, I ignored differences in duration of "," and ".". Of course, it should be modified if your data set needs.
I should read MFA change log carefully. In some recent versions, they remove {sp} in output

Oct 04 '21 07:10 huypl53

@EuphoriaCelestial In the MFA they don't use the punctuation to predict the silence, they will predict the silence by unsupervised manner. You can see your TextGrid files only contains the sp as the silence (except for sil at the beginning), so when using the MFA tool, you can remove the punctuation from from your input, for example: hello, world -> hello world.

Oct 04 '21 09:10 leminhnguyen

@leminhnguyen @EuphoriaCelestial what is the relevance of having only alphabets (i.e i am not saying AH,IY. But i am saying tokens that is made from the list alphabets.)

Jul 18 '23 20:07 debasishaimonk

FastSpeech2 FastSpeech2 copied to clipboard

Symbols included punctuation

FastSpeech2
FastSpeech2 copied to clipboard