FastSpeech2 icon indicating copy to clipboard operation
FastSpeech2 copied to clipboard

Symbols included punctuation

Open huypl53 opened this issue 3 years ago • 7 comments

I noticed that punctuation included in text/symbols.py, this means punctuation should be encoded. But

  1. In TextGrid generated by mfa, there are no punctuation. So model will never get punctuation in training.
  2. In synthesize.py, punctuations are all repalced by {sp}

So can you walk me through it?

huypl53 avatar Aug 13 '21 07:08 huypl53

Hi @phamlehuy53, in textgrid files generated by MFA tool, all punctuations will be modeled by sp.

leminhnguyen avatar Aug 14 '21 03:08 leminhnguyen

Hi @phamlehuy53, in textgrid files generated by MFA tool, all punctuations will be modeled by sp.

Of course. So model just learns sp not punctuation ( punctuation in text.symbols though), this is worthless. And in synthesizing, all punctuations are replace by sp, meanwhile some are not actually a sp like ()""@ so on

huypl53 avatar Aug 15 '21 04:08 huypl53

Because this repo depends on the MFA tool to get the duration of each phoneme, so the way to modeling punctuation will be affected by MFA, so I think you should ask the owner of MFA instead of FS2.

Btw, Tacotron is an example which can model all punctuations like you want to because it learns the alignment with unsupervised manner for each of encoded character. But in my opinion we should model frequent punctuations instead of all possible ones.

leminhnguyen avatar Aug 15 '21 04:08 leminhnguyen

I am not sure if I am understand this right, but if we use only {sp} for all punctuations, silence duration of "," and "." will be the same right? I have trained my own FS2 model using MFA tool to generate TG files, and sometime it skip punctuations. Anyone know the cause of this and how to fix it?

EuphoriaCelestial avatar Oct 04 '21 02:10 EuphoriaCelestial

I am not sure if I am understand this right, but if we use only {sp} for all punctuations, silence duration of "," and "." will be the same right? I have trained my own FS2 model using MFA tool to generate TG files, and sometime it skip punctuations. Anyone know the cause of this and how to fix it?

  • For easy operating on my data, I ignored differences in duration of "," and ".". Of course, it should be modified if your data set needs.
  • I should read MFA change log carefully. In some recent versions, they remove {sp} in output

huypl53 avatar Oct 04 '21 07:10 huypl53

@EuphoriaCelestial In the MFA they don't use the punctuation to predict the silence, they will predict the silence by unsupervised manner. You can see your TextGrid files only contains the sp as the silence (except for sil at the beginning), so when using the MFA tool, you can remove the punctuation from from your input, for example: hello, world -> hello world.

leminhnguyen avatar Oct 04 '21 09:10 leminhnguyen

@leminhnguyen @EuphoriaCelestial what is the relevance of having only alphabets (i.e i am not saying AH,IY. But i am saying tokens that is made from the list alphabets.)

debasishaimonk avatar Jul 18 '23 20:07 debasishaimonk