WaveRNN icon indicating copy to clipboard operation
WaveRNN copied to clipboard

声音有点抖,有点沙哑

Open freecui opened this issue 5 years ago • 25 comments

请听一下我的这个结果,听着某些词或者字有点抖有点沙哑,特别是抖,不知道原因是什么? 1350.zip

freecui avatar Feb 18 '20 13:02 freecui

(I translated your question with google translate.)

Set the parameter "voc_gen_batched" to False in your hparams.py Although batched WaveRNN is much faster than original WaveRNN, it is a trade-off feature. The number of batch size increases (the number of sample in each batch entry decreases), audio generation speed will be faster but the quality of generated sounds worse.

If you disable batched generation feature, the speed of audio generation will be very slow but it will ultimately generate finest results.

mindmapper15 avatar Feb 21 '20 05:02 mindmapper15

Thank you very much, I used to set voc_gen_batched = True , I will train that again setting voc_gen_batched = False

freecui avatar Feb 22 '20 01:02 freecui

You don't need to re-train your vocoder. voc_gen_batched is for inference only.

mindmapper15 avatar Feb 23 '20 01:02 mindmapper15

The audio voice is better when I set set voc_gen_batched = False for inference, but the consumption time increased from 33.42 seconds to 170 seconds on this utterance ; I want to do real time TTS, can you give me some advice?

freecui avatar Feb 25 '20 03:02 freecui

@freecui I implemented my own batched mode WaveRNN which is generating "unbatched(which means a single audio clip wasn't separated to multiple segments) multiple audio" at once.

It's still slower than original batched mode and consumes tones of VRAM but way faster than generating audio one by one with unbatched mode.

Maybe you should try that way.

I was focusing more on TTS not WaveRNN so I still don't know how to generate the finest result with batched single audio mode.

As I said, batched WaveRNN inference is trade-off feature. If you want the finest result and faster generation, you'd better implement the feature that generates multiple unbatched audio at once.

If you are focusing more generation time than quality, find the proper hp.voc_target and hp.voc_overlap value that satisfies both generation time and quality.

mindmapper15 avatar Feb 25 '20 07:02 mindmapper15

您好!我想请问一下您几个问题,您是自己训练的中文的合成吗?训练数据是哪里来的呢?这个模型支持中文的吗?期待您的回复!

zhangzhenyuyu avatar Mar 10 '20 14:03 zhangzhenyuyu

@freecui would you share your config file? Thanks a lot.

OswaldoBornemann avatar Mar 12 '20 03:03 OswaldoBornemann

@freecui Would you please share your wavernn training loss ?

OswaldoBornemann avatar Mar 13 '20 02:03 OswaldoBornemann

@zhangzhenyuyu ,训练数据是内部数据;支持中文模型的

freecui avatar Mar 15 '20 03:03 freecui

@tsungruihon ,we can use default parameters;

freecui avatar Mar 15 '20 03:03 freecui

@freecui Glad to hear that. That's a really amazing result. Would you mind sharing your wechat so that we could communicate ? I also focus on Chinese TTS and ASR. My email is [email protected]

OswaldoBornemann avatar Mar 15 '20 12:03 OswaldoBornemann

@freecui By the way, may i ask how many epoch or steps have you trained ?

OswaldoBornemann avatar Mar 16 '20 13:03 OswaldoBornemann

@freecui 請問一下,要訓練中文語音的話,需要對hparams或其他檔案做更改嗎? 我有在hparams.py裡面看到tts_cleaner_names = ['english_cleaners'],不知道是否要改成中文

justln1113 avatar Mar 17 '20 07:03 justln1113

@justln1113,这个要更改的,basic_clearners

freecui avatar Mar 17 '20 11:03 freecui

@freecui 好的,感謝答覆,還有甚麼需要注意的地方嗎?

justln1113 avatar Mar 17 '20 14:03 justln1113

@freecui 你好 想问下一下你训练的Loss和Steps是在哪一个位置呢谢谢

OswaldoBornemann avatar Mar 18 '20 03:03 OswaldoBornemann

@freecui 非常抱歉还想打扰您一下,我使用ts_cleaner_names = ['basic_clearners']遇到了错误,使用它得到的输入x都是空的,我在想是不是应该用transliteration_cleaners。 期待您的回复,谢谢!

zhangzhenyuyu avatar Mar 21 '20 01:03 zhangzhenyuyu

@freecui 非常抱歉还想打扰您一下,我使用ts_cleaner_names = ['basic_clearners']遇到了错误,使用它得到的输入x都是空的,我在想是不是应该用transliteration_cleaners。 期待您的回复,谢谢!

For Chinese, basic cleaners can work only if your input is pinyin or phoneme character.

SnowInHokkaido avatar Mar 30 '20 02:03 SnowInHokkaido

@freecui 兄弟,你那中文生成语音是用拼音吗?拼音格式是怎么样的,我现在按照你的方法训练了900K步,用拼音生成声音时还是不理想

jerryname2022 avatar Apr 21 '20 07:04 jerryname2022

就是拼音字符+声调,声音不理想,可以尝试再训练一定步数后降低学习率

xiaomingzhong [email protected] 于2020年4月21日周二 下午3:00写道:

@freecui https://github.com/freecui 兄弟,你那中文生成语音是用拼音吗?拼音格式是怎么样的,我现在按照你的方法训练了900K步,用拼音生成声音时还是不理想

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fatchord/WaveRNN/issues/170#issuecomment-616992837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3GCMR6YLIOFYRFJI2I45LRNVABFANCNFSM4KXEXHSA .

SnowInHokkaido avatar Apr 21 '20 10:04 SnowInHokkaido

就是拼音字符+声调,声音不理想,可以尝试再训练一定步数后降低学习率 xiaomingzhong [email protected] 于2020年4月21日周二 下午3:00写道: @freecui https://github.com/freecui 兄弟,你那中文生成语音是用拼音吗?拼音格式是怎么样的,我现在按照你的方法训练了900K步,用拼音生成声音时还是不理想 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#170 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3GCMR6YLIOFYRFJI2I45LRNVABFANCNFSM4KXEXHSA .

我是用LJSpeech数据集训练的,不知道是不是和这个有关

jerryname2022 avatar Apr 21 '20 10:04 jerryname2022

就是拼音字符+声调,声音不理想,可以尝试再训练一定步数后降低学习率 xiaomingzhong [email protected] 于2020年4月21日周二 下午3:00写道: @freecui https://github.com/freecui 兄弟,你那中文生成语音是用拼音吗?拼音格式是怎么样的,我现在按照你的方法训练了900K步,用拼音生成声音时还是不理想 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#170 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3GCMR6YLIOFYRFJI2I45LRNVABFANCNFSM4KXEXHSA .

我是用LJSpeech数据集训练的,不知道是不是和这个有关

我是想训练出正常一点的语音,现在感觉很机械,1楼那样的我感觉还可以就是我想要的结果

jerryname2022 avatar Apr 21 '20 10:04 jerryname2022

你好 你在ljspeech数据集下训练的结果怎么样? loss值大概是多少?我在训练500Ksteps后效果仍然很差。希望能够得到你的帮助。

1zxLi avatar Jun 10 '20 02:06 1zxLi

我也遇到了楼上的问题,抽了5000条VCTK数据集的语音来从头训练WavRNN(MOL模式),Batch size=64,训练1了450k steps效果还是很糟糕,真心请教您一下,有什么需要注意的地方吗? Loss曲线: 31a6e8c899b5e9e5f479d0fc641843c

400K steps时候生成的语音: 2bccfba6a7efa4e1dc6874c85644243

xuexidi avatar Sep 25 '20 15:09 xuexidi

我也遇到了楼上的问题,抽了5000条VCTK数据集的语音来从头训练WavRNN(MOL模式),Batch size=64,训练1了450k steps效果还是很糟糕,真心请教您一下,有什么需要注意的地方吗? Loss曲线: 31a6e8c899b5e9e5f479d0fc641843c

400K steps时候生成的语音: 2bccfba6a7efa4e1dc6874c85644243

你后来解决了吗?我在aishell3上训练的,遇到了同样的问题。

zhaoyun630 avatar Nov 30 '21 02:11 zhaoyun630