cnlinxi comments

Results 19 comments of


                                            cnlinxi

无法生成正确的音频

@EmptyMoon-hub 1. librosa库中PySoundFile没有调用成功，但这个应该不影响结果。 2. 需要提前利用pypinyin准备好数据集。输入到模型进行训练时，就应该是拼音形式了，而不能是汉字。 3. 你的错误是没有找到训练的ckpt模型文件。

中文数据处理

@CathyW77 是的，应该取拼音。我一般用的是第二行这种格式，一般而言这种影响不大。

中文数据处理

@CathyW77 是的，symbols是使用的默认的。5w轮应该基本可以了，检查一下你的training.txt中的文本和对应的音频。另外，所谓的效果很差，是怎样的差？alignment对不齐吗？

中文数据处理

@CathyW77 抱歉现在才看到。这是WaveNet的训练参数，可以忽略。这是使用README上面写的那个开源库基础上修改的，开源库自带了WaveNet声码器的训练代码，而这里的声码器速度太慢了，建议不要用了。

the trained model generates different wavs with the same text and reference audio

Please specify reference audio's path in the 'tacotron_style_reference_audio' of hparams.py, then synthesizing. Feel free to raise more questions.

the trained model generates different wavs with the same text and reference audio

In hparams.py: tacotron_style_alignment=None, you can manually specify style token alignment weights instead of getting them from reference audio. Do you mean this?

the trained model generates different wavs with the same text and reference audio

@MorganCZY In the original Tacotron-2, dropout was turned on during inference, and so is this one. So, every time you generate wav, the audio will be different.

the trained model generates different wavs with the same text and reference audio

@CathyW77 在生成时，关闭prenet中的dropout应该就可以了。在tacotron/models/modules.py中**Prenet**类中，有： `` x = tf.layers.dropout(dense, rate=self.drop_rate, training=True, name='dropout_{}'.format(i + 1) + self.scope) `` 对`tf.layers.dropout()`中的参数`training`在生成时，置为False。

the trained model generates different wavs with the same text and reference audio

@MorganCZY What does correct wav mean? Can't generate audio?