nonebot_plugin_tts_gal
nonebot_plugin_tts_gal copied to clipboard
语音生成的有问题
过程无报错,使用的是我自己训练的model 我怀疑是config.json出现问题,因为有小部分东西是没有的,我自己加上去的 这个是修改之前的,有些key找不到 修改前的.txt 然后这个是修改后的,自己手动加了一些key 修改后的.txt
而我模型是用G开头而不是D开头的,当然在放进nonebot下的文件夹的时候已经重命名好了 生成也成功了,但就是会变成滋的一声
不知道跟symbols的关系大不大,这个我不太懂怎么配置
symbols是必需的,要与模型训练是使用的一致,你可以找找训练使用的是什么,然后添加上去
symbols是必需的,要与模型训练是使用的一致,你可以找找训练使用的是什么,然后添加上去
我用的是so-vits-svc 4.0,我应该去哪里找到对应的symbols
so-vits-svc应该不能用,我这个还是vits
so-vits-svc应该不能用,我这个还是vits
!这样呀,我还以为模型后缀一样,所以是通用的 他们的模型上的有区别的吗?
后续会不会支持so-vits-svc,我在网上看见很多利用so-vits-svc文字转语音的,so-vits-svc理论上应该也挺多人用,模型数量也多
或者是so-vits-svc的模型能否转换为vits的模型?
很抱歉,这方面我还没有去了解过,暂时回答不了
so-vits-svc应该不能用,我这个还是vits
我拿了一个vits模型来试验,但是发现不知道为什么无论输入多长的句子,机器人只输出1秒,而且还只是说一声而已 生成时没有报错,以下是我模型的config.json:
{
"train": {
"log_interval": 10,
"eval_interval": 100,
"seed": 1234,
"epochs": 10000,
"learning_rate": 0.0002,
"betas": [
0.8,
0.99
],
"eps": 1e-09,
"batch_size": 16,
"fp16_run": true,
"lr_decay": 0.999875,
"segment_size": 8192,
"init_lr_ratio": 1,
"warmup_epochs": 0,
"c_mel": 45,
"c_kl": 1.0
},
"data": {
"training_files": "final_annotation_train.txt",
"validation_files": "final_annotation_val.txt",
"text_cleaners": [
"chinese_cleaners2"
],
"max_wav_value": 32768.0,
"sampling_rate": 22050,
"filter_length": 1024,
"hop_length": 256,
"win_length": 1024,
"n_mel_channels": 80,
"mel_fmin": 0.0,
"mel_fmax": null,
"add_blank": true,
"n_speakers": 3,
"cleaned_text": true
},
"model": {
"inter_channels": 192,
"hidden_channels": 192,
"filter_channels": 768,
"n_heads": 2,
"n_layers": 6,
"kernel_size": 3,
"p_dropout": 0.1,
"resblock": "1",
"resblock_kernel_sizes": [
3,
7,
11
],
"resblock_dilation_sizes": [
[
1,
3,
5
],
[
1,
3,
5
],
[
1,
3,
5
]
],
"upsample_rates": [
8,
8,
2,
2
],
"upsample_initial_channel": 512,
"upsample_kernel_sizes": [
16,
16,
4,
4
],
"n_layers_q": 3,
"use_spectral_norm": false,
"gin_channels": 256
},
"speakers": [
"Qingzi",
"specialweek",
"zhongli"
],
"symbols": [
"_",
",",
".",
"!",
"?",
"-",
"~",
"\u2026",
"A",
"E",
"I",
"N",
"O",
"Q",
"U",
"a",
"b",
"d",
"e",
"f",
"g",
"h",
"i",
"j",
"k",
"l",
"m",
"n",
"o",
"p",
"r",
"s",
"t",
"u",
"v",
"w",
"y",
"z",
"\u0283",
"\u02a7",
"\u02a6",
"\u026f",
"\u0279",
"\u0259",
"\u0265",
"\u207c",
"\u02b0",
"`",
"\u2192",
"\u2193",
"\u2191",
" "
]
}
我只修改了其中的"text_cleaners",这个模型的json拿到的时候"text_cleaners"是"zh_ja_mixture_cleaners",我改成了"chinese_cleaners2",请问跟这个有关系吗?(其他地方没动过)
so-vits-svc应该不能用,我这个还是vits
我拿了一个vits模型来试验,但是发现不知道为什么无论输入多长的句子,机器人只输出1秒,而且还只是说一声而已 生成时没有报错,以下是我模型的config.json:
{ "train": { "log_interval": 10, "eval_interval": 100, "seed": 1234, "epochs": 10000, "learning_rate": 0.0002, "betas": [ 0.8, 0.99 ], "eps": 1e-09, "batch_size": 16, "fp16_run": true, "lr_decay": 0.999875, "segment_size": 8192, "init_lr_ratio": 1, "warmup_epochs": 0, "c_mel": 45, "c_kl": 1.0 }, "data": { "training_files": "final_annotation_train.txt", "validation_files": "final_annotation_val.txt", "text_cleaners": [ "chinese_cleaners2" ], "max_wav_value": 32768.0, "sampling_rate": 22050, "filter_length": 1024, "hop_length": 256, "win_length": 1024, "n_mel_channels": 80, "mel_fmin": 0.0, "mel_fmax": null, "add_blank": true, "n_speakers": 3, "cleaned_text": true }, "model": { "inter_channels": 192, "hidden_channels": 192, "filter_channels": 768, "n_heads": 2, "n_layers": 6, "kernel_size": 3, "p_dropout": 0.1, "resblock": "1", "resblock_kernel_sizes": [ 3, 7, 11 ], "resblock_dilation_sizes": [ [ 1, 3, 5 ], [ 1, 3, 5 ], [ 1, 3, 5 ] ], "upsample_rates": [ 8, 8, 2, 2 ], "upsample_initial_channel": 512, "upsample_kernel_sizes": [ 16, 16, 4, 4 ], "n_layers_q": 3, "use_spectral_norm": false, "gin_channels": 256 }, "speakers": [ "Qingzi", "specialweek", "zhongli" ], "symbols": [ "_", ",", ".", "!", "?", "-", "~", "\u2026", "A", "E", "I", "N", "O", "Q", "U", "a", "b", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "r", "s", "t", "u", "v", "w", "y", "z", "\u0283", "\u02a7", "\u02a6", "\u026f", "\u0279", "\u0259", "\u0265", "\u207c", "\u02b0", "`", "\u2192", "\u2193", "\u2191", " " ] }
我只修改了其中的"text_cleaners",这个模型的json拿到的时候"text_cleaners"是"zh_ja_mixture_cleaners",我改成了"chinese_cleaners2",请问跟这个有关系吗?(其他地方没动过)
目前插件内没有zh_ja_mixture_cleaners
这一选项,两种方法是不同的,所以出错了
so-vits-svc应该不能用,我这个还是vits
我拿了一个vits模型来试验,但是发现不知道为什么无论输入多长的句子,机器人只输出1秒,而且还只是说一声而已 生成时没有报错,以下是我模型的config.json:
{ "train": { "log_interval": 10, "eval_interval": 100, "seed": 1234, "epochs": 10000, "learning_rate": 0.0002, "betas": [ 0.8, 0.99 ], "eps": 1e-09, "batch_size": 16, "fp16_run": true, "lr_decay": 0.999875, "segment_size": 8192, "init_lr_ratio": 1, "warmup_epochs": 0, "c_mel": 45, "c_kl": 1.0 }, "data": { "training_files": "final_annotation_train.txt", "validation_files": "final_annotation_val.txt", "text_cleaners": [ "chinese_cleaners2" ], "max_wav_value": 32768.0, "sampling_rate": 22050, "filter_length": 1024, "hop_length": 256, "win_length": 1024, "n_mel_channels": 80, "mel_fmin": 0.0, "mel_fmax": null, "add_blank": true, "n_speakers": 3, "cleaned_text": true }, "model": { "inter_channels": 192, "hidden_channels": 192, "filter_channels": 768, "n_heads": 2, "n_layers": 6, "kernel_size": 3, "p_dropout": 0.1, "resblock": "1", "resblock_kernel_sizes": [ 3, 7, 11 ], "resblock_dilation_sizes": [ [ 1, 3, 5 ], [ 1, 3, 5 ], [ 1, 3, 5 ] ], "upsample_rates": [ 8, 8, 2, 2 ], "upsample_initial_channel": 512, "upsample_kernel_sizes": [ 16, 16, 4, 4 ], "n_layers_q": 3, "use_spectral_norm": false, "gin_channels": 256 }, "speakers": [ "Qingzi", "specialweek", "zhongli" ], "symbols": [ "_", ",", ".", "!", "?", "-", "~", "\u2026", "A", "E", "I", "N", "O", "Q", "U", "a", "b", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "r", "s", "t", "u", "v", "w", "y", "z", "\u0283", "\u02a7", "\u02a6", "\u026f", "\u0279", "\u0259", "\u0265", "\u207c", "\u02b0", "`", "\u2192", "\u2193", "\u2191", " " ] }
我只修改了其中的"text_cleaners",这个模型的json拿到的时候"text_cleaners"是"zh_ja_mixture_cleaners",我改成了"chinese_cleaners2",请问跟这个有关系吗?(其他地方没动过)
目前插件内没有
zh_ja_mixture_cleaners
这一选项,两种方法是不同的,所以出错了
这样呀,但是我需要怎样才能把zh_ja_mixture_cleaners转换为chinese_cleaners2,还是说这个模型就直接用不了了?
如果在不改变代码内容的情况下那就是不能用了
如果在不改变代码内容的情况下那就是不能用了
我又生成了一个,是“chinese_cleaners”的,后面没有带2,我把json里面改成“chinese_cleaners2”应该也可以吧?