fish-speech icon indicating copy to clipboard operation
fish-speech copied to clipboard

[BUG] 关于微调来达到语音克隆的一些测试与问题

Open EnochYe opened this issue 1 year ago • 6 comments

你好,我是一名人工智能行业的爱好者,首先非常感谢你们做出了fish-speech这么优秀的开源项目!但是我在自己尝试微调fish-speech模型来达到语音克隆的效果时,遇到了一些问题,想向你们请教一下微调的细节问题,不知你们能否为我解答一下。

首先,我这里有一份30min的模特语音数据,在此基础上我进行了如下的微调与测试:

  1. 测试1
  • 训练数据输入:30min的音频,使用audio-preprocess库进行了自动切片、响度匹配、打标等处理。

  • 超参数设置:训练1000个epoch

  • 克隆结果:

    • 测试文本:'''它的宽大的叶子也是片片向上。就像这白杨树一样傲然挺立的守卫他们家乡的哨兵。难道你又不更远一点。想到这样枝枝叶叶靠紧团结。而那个叫布鲁诺的小伙子却仍在原地踏步。老板一边耐心地听着他的抱怨。'''

    • 测试结果:使用了10s中的参考音频来作为fake.npy,结果如附件里的“测试1-1000epoch”和“测试1-600epoch所示。

    • 存在问题:生成的音频中有含糊不清、错误的部分,且语速节奏很奇怪。

  1. 测试2
  • 训练数据输入:5min的音频,使用audio-preprocess库进行了自动切片、响度匹配、打标等处理。

  • 超参数设置:训练1000个epoch

  • 克隆结果:

    • 测试文本:'''它的宽大的叶子也是片片向上。就像这白杨树一样傲然挺立的守卫他们家乡的哨兵。难道你又不更远一点。想到这样枝枝叶叶靠紧团结。而那个叫布鲁诺的小伙子却仍在原地踏步。老板一边耐心地听着他的抱怨。'''

    • 测试结果:使用了10s中的参考音频来作为fake.npy,结果如附件里的“测试2-1000epoch”所示。

    • 存在问题:生成的音频中有含糊不清、错误以及非常长的无意义的部分。

  1. 测试3
  • 训练数据输入:5min的音频,使用audio-preprocess库进行了自动切片、响度匹配、打标等处理。

  • 超参数设置:训练100个epoch

  • 克隆结果:

    • 测试文本:'''它的宽大的叶子也是片片向上。就像这白杨树一样傲然挺立的守卫他们家乡的哨兵。难道你又不更远一点。想到这样枝枝叶叶靠紧团结。而那个叫布鲁诺的小伙子却仍在原地踏步。老板一边耐心地听着他的抱怨。'''

    • 测试结果:使用了10s中的参考音频来作为fake.npy,结果如附件里的“测试3”所示。

    • 存在问题:生成的音频中会出现多余的句子。

对于我上面的这些实验,我分别尝试了比较长和比较短的训练音频、多的epoch和少的epoch,但是总是达不到理想的效果。对于我出现的这些问题,请问你们那边有什么建议吗?或者你们对于语音克隆微调的时候,有没有什么比较稳定好用的setting?

再次感谢你们做出了如此优秀的项目,同时也期待你们的解答~ (由于github issue不能添加音频文件为附件,以上提到的附件放到了此onedrive链接里面:https://1drv.ms/f/s!Anj5aIRFC0FNhFPcqUbNXm1EDv8m?e=RdLLjQ)


2024年7月20日10:19:33 Update 通过翻阅仓库的issue,我看到作者说微调的训练数据要在30min-1h,对此 我又使用了45min的另一位模特的声音进行微调测试。 我分别尝试了500epoch和1000epoch的权重,并且在推理的时候也都使用了5s左右音频的fake.npy来作为参考。 但是也是存在“生成的音频中会出现多余的句子”的问题。

EnochYe avatar Jul 20 '24 02:07 EnochYe

audio (22).zip image 可以试试这个参数

AnyaCoder avatar Jul 20 '24 03:07 AnyaCoder

100 个 epoch 有一些过拟合的风险, 结尾多于句子建议使用 webui 自动重抽, 我们在模型层面还在做一些优化.

leng-yue avatar Jul 20 '24 04:07 leng-yue

audio (22).zip image 可以试试这个参数

当我在使用微调好的模型进行推理的时候,如果带上"--max-new-tokens 1024" 总会导致生成过程中报错,同时如果使用fish-speech原始模型进行推理的时候能够正常生成但是生成效果并未提升。

不知道是不是我对"每批最大令牌数"这个参数理解错误导致的。

另附我的生成命令、测试结果、以及报错详情:

  • 命令:'''python tools/llama/generate.py
    --text "它的宽大的叶子也是片片向上。就像这白杨树一样傲然挺立的守卫他们家乡的哨兵。难道你又不更远一点。想到这样枝枝叶叶靠紧团结。而那个叫布鲁诺的小伙子却 仍在原地踏步。老板一边耐心地听着他的抱怨。"
    --prompt-text "杭州体育与你同行大家好,欢迎收看今天的杭州体育家,我是马谦。"
    --prompt-tokens ./idr_test_data/MaQian/fake.npy
    --checkpoint-path ./checkpoints/maqian_100
    --num-samples 8
    --compile
    --max-new-tokens 1024'''
  • 测试结果:https://1drv.ms/f/s!Anj5aIRFC0FNhFhO32JqvTV87CGu?e=loGut9
  • 报错详情:> python tools/llama/generate.py \ --text "它的宽大的叶子也是片片向上。就像这白杨树一样傲然挺立的守卫他们家乡的哨兵。难道你又不更远一点。想到这样枝枝叶叶靠紧团结。而那个叫布鲁诺的小伙子却 仍在原地踏步。老板一边耐心地听着他的抱怨。" \ --prompt-text "杭州体育与你同行大家好,欢迎收看今天的杭州体育家,我是马谦。" \ --prompt-tokens ./idr_test_data/MaQian/fake.npy \ --checkpoint-path ./checkpoints/maqian_100 \ --num-samples 8 \ --compile \ --max-new-tokens 1024 2024-07-21 13:04:47.783 | INFO | __main__:main:639 - Loading model ... 2024-07-21 13:04:53.314 | INFO | __main__:load_model:347 - Restored model from checkpoint 2024-07-21 13:04:53.314 | INFO | __main__:load_model:351 - Using DualARTransformer 2024-07-21 13:04:53.314 | INFO | __main__:load_model:357 - Compiling function... 2024-07-21 13:04:53.319 | INFO | __main__:main:648 - Time to load model: 5.54 seconds 2024-07-21 13:04:53.336 | INFO | __main__:generate_long:432 - Encoded text: 它的宽大的叶子也是片片向上. 2024-07-21 13:04:53.337 | INFO | __main__:generate_long:432 - Encoded text: 就像这白杨树一样傲然挺立的守卫他们家乡的哨兵.难道你又不更远一点. 2024-07-21 13:04:53.337 | INFO | __main__:generate_long:432 - Encoded text: 想到这样枝枝叶叶靠紧团结.而那个叫布鲁诺的小伙子却.仍在原地踏步. 2024-07-21 13:04:53.337 | INFO | __main__:generate_long:432 - Encoded text: 老板一边耐心地听着他的抱怨. 2024-07-21 13:04:53.338 | INFO | __main__:generate_long:450 - Generating sentence 1/4 of sample 1/8 0%| | 0/1023 [00:00<?, ?it/s]/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature. warnings.warn( 0%| | 1/1023 [00:17<5:02:04, 17.73s/it]/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature. warnings.warn( 0%|▏ | 2/1023 [00:17<2:06:02, 7.41s/it]/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature. warnings.warn( 83%|██████████████████████████████████████████████████████████████████████████▋ | 849/1023 [00:20<00:04, 41.76it/s] 2024-07-21 13:05:14.000 | INFO | __main__:generate_long:496 - Compilation time: 20.66 seconds 2024-07-21 13:05:14.000 | INFO | __main__:generate_long:505 - Generated 851 tokens in 20.66 seconds, 41.19 tokens/sec 2024-07-21 13:05:14.000 | INFO | __main__:generate_long:508 - Bandwidth achieved: 20.19 GB/s 2024-07-21 13:05:14.001 | INFO | __main__:generate_long:513 - GPU Memory used: 1.10 GB 2024-07-21 13:05:14.001 | INFO | __main__:main:681 - Sampled text: 它的宽大的叶子也是片片向上. 2024-07-21 13:05:14.001 | INFO | __main__:generate_long:450 - Generating sentence 2/4 of sample 1/8 11%|█████████▍ | 108/1023 [00:00<00:02, 351.97it/s]<frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [0,0,0] Assertion index out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [1,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [2,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [3,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [4,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [5,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [6,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [7,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [8,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [9,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [10,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [11,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [12,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [13,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [14,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [15,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [16,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [17,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [18,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [19,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [20,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [21,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [22,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [23,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [24,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [25,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [26,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [27,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [28,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [29,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [30,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [31,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [32,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [33,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [34,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [35,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [36,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [37,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [38,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [39,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [40,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [41,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [42,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [43,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [44,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [45,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [46,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [47,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [48,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [49,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [50,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [51,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [52,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [53,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [54,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [55,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [56,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [57,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [58,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [59,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [60,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [61,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [62,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [63,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [64,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [65,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [66,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [67,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [68,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [69,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [70,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [71,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [72,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [73,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [74,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [75,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [76,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [77,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [78,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [79,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [80,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [81,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [82,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [83,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [84,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [85,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [86,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [87,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [88,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [89,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [90,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [91,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [92,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [93,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [94,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [95,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [96,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [97,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [98,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [99,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [100,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [101,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [102,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [103,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [104,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [105,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [106,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [107,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [108,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [109,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [110,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [111,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [112,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [113,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [114,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [115,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [116,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [117,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [118,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [119,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [120,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [121,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [122,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [123,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [124,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [125,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [126,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. <frozen importlib._bootstrap_external>:883: _call_with_frames_removed: block: [0,0,0], thread: [127,0,0] Assertionindex out of bounds: 0 <= tmp4 < 1344failed. 14%|████████████ | 139/1023 [00:00<00:02, 337.01it/s] Traceback (most recent call last): File "/data-disk/users/users/repository/fish-speech/tools/llama/generate.py", line 694, in <module> main() File "/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/data-disk/users/users/repository/fish-speech/tools/llama/generate.py", line 678, in main for response in generator: File "/data-disk/users/users/repository/fish-speech/tools/llama/generate.py", line 484, in generate_long y = generate( File "/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data-disk/users/users/miniconda3/envs/fish-speech/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data-disk/users/users/repository/fish-speech/tools/llama/generate.py", line 261, in generate x = decode_n_tokens( File "/data-disk/users/users/repository/fish-speech/tools/llama/generate.py", line 201, in decode_n_tokens if cur_token[0, 0, -1] == im_end_id: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile withTORCH_USE_CUDA_DSA to enable device-side assertions.

不知道你那边是否有遇到类似的情况以及相关的建议?

EnochYe avatar Jul 21 '24 05:07 EnochYe

100 个 epoch 有一些过拟合的风险, 结尾多于句子建议使用 webui 自动重抽, 我们在模型层面还在做一些优化.

嗯,我分别尝试微调100epoch与1000epoch,发现确实少点epoch的模型效果更好。

但是小短句里面会生成额外的语音这个问题确实还是比较严重,大约生成8次结果里面只有2次效果比较好。不知道你那边还有没有什么trick或者建议?

另附我的测试结果:https://1drv.ms/f/s!Anj5aIRFC0FNhFhO32JqvTV87CGu?e=loGut9

EnochYe avatar Jul 21 '24 05:07 EnochYe

说明一下你的硬件情况

AnyaCoder avatar Jul 21 '24 05:07 AnyaCoder

说明一下你的硬件情况

2024-07-21 at 15 48 17@2x

EnochYe avatar Jul 21 '24 07:07 EnochYe

说明一下你的硬件情况

2024-07-21 at 15 48 17@2x

@EnochYe 请问下你这边解决这个问题了吗?我是在推理的过程中遇到同样的问题。

David-19940718 avatar Dec 19 '24 08:12 David-19940718

说明一下你的硬件情况

2024-07-21 at 15 48 17@2x

@EnochYe 请问下你这边解决这个问题了吗?我是在推理的过程中遇到同样的问题。

没有,最后放弃了微调换成f5tts了

EnochYe avatar Dec 20 '24 07:12 EnochYe

我也感觉微调的效果不大理想,是对数据有什么特殊要求吗

XieCong157312 avatar Aug 25 '25 07:08 XieCong157312