DiffGAN-TTS icon indicating copy to clipboard operation
DiffGAN-TTS copied to clipboard

Can I ask you some questions about mel-spectrogram?

Open Dyongh613 opened this issue 2 years ago • 3 comments

HI@keonlee9420, I have some questions to ask you about the mel-spectrogram. In the picture, image The above mel-spectrogram alignment has been generated, but the horizontal details have not been released yet. What problem do you think caused it

Dyongh613 avatar Jun 29 '22 02:06 Dyongh613

Hi @qw1260497397 , thanks for your attention. I need more information about your training. How many steps did you take to generate the mel-spectrogram? What dataset did you use? Did you follow the config in this repo or change something?

At first glance, it seems that more training will solve it.

keonlee9420 avatar Jun 29 '22 13:06 keonlee9420

Hi @keonlee9420. In my work, I use the LJSpeech, and I add the diffusion mechanism to portaspeech. The first stage is tranined by 160000 steps with 64 batch. This spectrogram was generated by training 150000 steps in the second stage. 

------------------ 原始邮件 ------------------ 发件人: "keonlee9420/DiffGAN-TTS" @.>; 发送时间: 2022年6月29日(星期三) 晚上9:35 @.>; 抄送: "Rui @.@.>; 主题: Re: [keonlee9420/DiffGAN-TTS] Can I ask you some questions about mel-spectrogram? (Issue #11)

Hi @qw1260497397 , thanks for your attention. I need more information about your training. How many steps did you take to generate the mel-spectrogram? What dataset did you use? Did you follow the config in this repo or change something?

At first glance, it seems that more training will solve it.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Dyongh613 avatar Jun 29 '22 13:06 Dyongh613

Oh, I see. Although I don't know any of details of your implementation, I can give you one tip which is to replace each module one by one with the simplest but surest architecture. For example, you may replace the encoder in PortaSpeech with FastSpeech2's text encoder to check whether the word-to-phoneme alignment was working or not.

keonlee9420 avatar Jul 03 '22 15:07 keonlee9420