FastSpeech2
FastSpeech2 copied to clipboard
Symbols included punctuation
I noticed that punctuation included in text/symbols.py, this means punctuation should be encoded. But
- In TextGrid generated by mfa, there are no punctuation. So model will never get punctuation in training.
- In synthesize.py, punctuations are all repalced by
{sp}
So can you walk me through it?
Hi @phamlehuy53, in textgrid files generated by MFA tool, all punctuations will be modeled by sp
.
Hi @phamlehuy53, in textgrid files generated by MFA tool, all punctuations will be modeled by
sp
.
Of course. So model just learns sp
not punctuation ( punctuation in text.symbols
though), this is worthless.
And in synthesizing, all punctuations are replace by sp
, meanwhile some are not actually a sp
like ()""@
so on
Because this repo depends on the MFA tool to get the duration of each phoneme, so the way to modeling punctuation will be affected by MFA, so I think you should ask the owner of MFA instead of FS2.
Btw, Tacotron is an example which can model all punctuations like you want to because it learns the alignment with unsupervised manner for each of encoded character. But in my opinion we should model frequent punctuations instead of all possible ones.
I am not sure if I am understand this right, but if we use only {sp} for all punctuations, silence duration of "," and "." will be the same right? I have trained my own FS2 model using MFA tool to generate TG files, and sometime it skip punctuations. Anyone know the cause of this and how to fix it?
I am not sure if I am understand this right, but if we use only {sp} for all punctuations, silence duration of "," and "." will be the same right? I have trained my own FS2 model using MFA tool to generate TG files, and sometime it skip punctuations. Anyone know the cause of this and how to fix it?
- For easy operating on my data, I ignored differences in duration of "," and ".". Of course, it should be modified if your data set needs.
- I should read MFA change log carefully. In some recent versions, they remove {sp} in output
@EuphoriaCelestial In the MFA they don't use the punctuation to predict the silence, they will predict the silence by unsupervised manner. You can see your TextGrid files only contains the sp
as the silence (except for sil
at the beginning), so when using the MFA tool, you can remove the punctuation from from your input, for example: hello, world
-> hello world
.
@leminhnguyen @EuphoriaCelestial what is the relevance of having only alphabets (i.e i am not saying AH,IY. But i am saying tokens that is made from the list alphabets.)