WaveRNN 声音有点抖，有点沙哑

请听一下我的这个结果，听着某些词或者字有点抖有点沙哑，特别是抖，不知道原因是什么？ 1350.zip

Feb 18 '20 13:02 freecui

(I translated your question with google translate.)

Set the parameter "voc_gen_batched" to False in your hparams.py Although batched WaveRNN is much faster than original WaveRNN, it is a trade-off feature. The number of batch size increases (the number of sample in each batch entry decreases), audio generation speed will be faster but the quality of generated sounds worse.

If you disable batched generation feature, the speed of audio generation will be very slow but it will ultimately generate finest results.

Feb 21 '20 05:02 mindmapper15

Thank you very much, I used to set voc_gen_batched = True , I will train that again setting voc_gen_batched = False

Feb 22 '20 01:02 freecui

You don't need to re-train your vocoder. voc_gen_batched is for inference only.

Feb 23 '20 01:02 mindmapper15

The audio voice is better when I set set voc_gen_batched = False for inference, but the consumption time increased from 33.42 seconds to 170 seconds on this utterance ; I want to do real time TTS, can you give me some advice?

Feb 25 '20 03:02 freecui

@freecui I implemented my own batched mode WaveRNN which is generating "unbatched(which means a single audio clip wasn't separated to multiple segments) multiple audio" at once.

It's still slower than original batched mode and consumes tones of VRAM but way faster than generating audio one by one with unbatched mode.

Maybe you should try that way.

I was focusing more on TTS not WaveRNN so I still don't know how to generate the finest result with batched single audio mode.

As I said, batched WaveRNN inference is trade-off feature. If you want the finest result and faster generation, you'd better implement the feature that generates multiple unbatched audio at once.

If you are focusing more generation time than quality, find the proper hp.voc_target and hp.voc_overlap value that satisfies both generation time and quality.

Feb 25 '20 07:02 mindmapper15

您好！我想请问一下您几个问题，您是自己训练的中文的合成吗？训练数据是哪里来的呢？这个模型支持中文的吗？期待您的回复！

Mar 10 '20 14:03 zhangzhenyuyu

@freecui would you share your config file? Thanks a lot.

Mar 12 '20 03:03 OswaldoBornemann

@freecui Would you please share your wavernn training loss ?

Mar 13 '20 02:03 OswaldoBornemann

@zhangzhenyuyu ,训练数据是内部数据；支持中文模型的

Mar 15 '20 03:03 freecui

@tsungruihon ，we can use default parameters;

Mar 15 '20 03:03 freecui

@freecui Glad to hear that. That's a really amazing result. Would you mind sharing your wechat so that we could communicate ? I also focus on Chinese TTS and ASR. My email is [email protected]

Mar 15 '20 12:03 OswaldoBornemann

@freecui By the way, may i ask how many epoch or steps have you trained ?

Mar 16 '20 13:03 OswaldoBornemann

@freecui 請問一下，要訓練中文語音的話，需要對hparams或其他檔案做更改嗎？我有在hparams.py裡面看到tts_cleaner_names = ['english_cleaners']，不知道是否要改成中文

Mar 17 '20 07:03 justln1113

@justln1113,这个要更改的，basic_clearners

Mar 17 '20 11:03 freecui

@freecui 好的，感謝答覆，還有甚麼需要注意的地方嗎?

Mar 17 '20 14:03 justln1113

@freecui 你好想问下一下你训练的Loss和Steps是在哪一个位置呢谢谢

Mar 18 '20 03:03 OswaldoBornemann

@freecui 非常抱歉还想打扰您一下，我使用ts_cleaner_names = ['basic_clearners']遇到了错误，使用它得到的输入ｘ都是空的，我在想是不是应该用transliteration_cleaners。　期待您的回复，谢谢！

Mar 21 '20 01:03 zhangzhenyuyu

@freecui 非常抱歉还想打扰您一下，我使用ts_cleaner_names = ['basic_clearners']遇到了错误，使用它得到的输入ｘ都是空的，我在想是不是应该用transliteration_cleaners。　期待您的回复，谢谢！

For Chinese, basic cleaners can work only if your input is pinyin or phoneme character.

Mar 30 '20 02:03 SnowInHokkaido

@freecui 兄弟，你那中文生成语音是用拼音吗？拼音格式是怎么样的，我现在按照你的方法训练了900K步，用拼音生成声音时还是不理想

Apr 21 '20 07:04 jerryname2022

就是拼音字符+声调，声音不理想，可以尝试再训练一定步数后降低学习率

xiaomingzhong [email protected] 于2020年4月21日周二下午3:00写道：

@freecui https://github.com/freecui 兄弟，你那中文生成语音是用拼音吗？拼音格式是怎么样的，我现在按照你的方法训练了900K步，用拼音生成声音时还是不理想

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fatchord/WaveRNN/issues/170#issuecomment-616992837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3GCMR6YLIOFYRFJI2I45LRNVABFANCNFSM4KXEXHSA .

Apr 21 '20 10:04 SnowInHokkaido

就是拼音字符+声调，声音不理想，可以尝试再训练一定步数后降低学习率 xiaomingzhong [email protected] 于2020年4月21日周二下午3:00写道： … @freecui https://github.com/freecui 兄弟，你那中文生成语音是用拼音吗？拼音格式是怎么样的，我现在按照你的方法训练了900K步，用拼音生成声音时还是不理想 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#170 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3GCMR6YLIOFYRFJI2I45LRNVABFANCNFSM4KXEXHSA .

我是用LJSpeech数据集训练的，不知道是不是和这个有关

Apr 21 '20 10:04 jerryname2022

就是拼音字符+声调，声音不理想，可以尝试再训练一定步数后降低学习率 xiaomingzhong [email protected] 于2020年4月21日周二下午3:00写道： … @freecui https://github.com/freecui 兄弟，你那中文生成语音是用拼音吗？拼音格式是怎么样的，我现在按照你的方法训练了900K步，用拼音生成声音时还是不理想 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#170 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3GCMR6YLIOFYRFJI2I45LRNVABFANCNFSM4KXEXHSA .

我是用LJSpeech数据集训练的，不知道是不是和这个有关

我是想训练出正常一点的语音，现在感觉很机械，1楼那样的我感觉还可以就是我想要的结果

Apr 21 '20 10:04 jerryname2022

你好你在ljspeech数据集下训练的结果怎么样？ loss值大概是多少？我在训练500Ksteps后效果仍然很差。希望能够得到你的帮助。