fish-speech icon indicating copy to clipboard operation
fish-speech copied to clipboard

[BUG]勾选Compile Model进入推理界面之后就推理失败,勾选NO 就可以推理

Open moyutegong opened this issue 9 months ago • 13 comments

image 错误日志为:

Traceback (most recent call last):
  File "/root/anaconda3/envs/fish-speech/lib/python3.10/site-packages/gradio/queueing.py", line 527, in process_events
    response = await route_utils.call_process_api(
  File "/root/anaconda3/envs/fish-speech/lib/python3.10/site-packages/gradio/route_utils.py", line 270, in call_process_api
    output = await app.get_blocks().process_api(
  File "/root/anaconda3/envs/fish-speech/lib/python3.10/site-packages/gradio/blocks.py", line 1897, in process_api
    data = await self.postprocess_data(fn_index, result["prediction"], state)
  File "/root/anaconda3/envs/fish-speech/lib/python3.10/site-packages/gradio/blocks.py", line 1673, in postprocess_data
    self.validate_outputs(fn_index, predictions)  # type: ignore
  File "/root/anaconda3/envs/fish-speech/lib/python3.10/site-packages/gradio/blocks.py", line 1649, in validate_outputs
    raise ValueError(
ValueError: An event handler (inference_wrapper) didn't receive enough output values (needed: 5, received: 4).
Wanted outputs:
    [<gradio.components.audio.Audio object at 0x7fd1bc759f90>, <gradio.components.audio.Audio object at 0x7fd1bc759a80>, <gradio.components.audio.Audio object at 0x7fd1bc759d50>, <gradio.components.audio.Audio object at 0x7fd1bc759e70>, <gradio.components.html.HTML object at 0x7fd1bc72bd90>]
Received outputs:
    [None, <gradio.components.audio.Audio object at 0x7fd0bacbaad0>, <gradio.components.audio.Audio object at 0x7fd0c80ccdf0>, None]   

使用wsl2运行

moyutegong avatar May 14 '24 09:05 moyutegong

好像推理文字过多也会导致出现这个问题,我填了10000字左右进去,Compile Model是NO

 30%|█████████████████████████████▌                                                                      | 173/585 [00:08<00:21, 19.32it/s]
2024-05-14 18:30:43.006 | INFO     | tools.llama.generate:generate_long:581 - Generated 175 tokens in 10.18 seconds, 17.19 tokens/sec      
2024-05-14 18:30:43.006 | INFO     | tools.llama.generate:generate_long:584 - Bandwidth achieved: 17.65 GB/s
2024-05-14 18:30:43.006 | INFO     | tools.llama.generate:generate_long:589 - GPU Memory used: 5.19 GB
2024-05-14 18:30:43.007 | INFO     | tools.llama.generate:generate_long:527 - Generating sentence 14/106 of sample 1/1
2024-05-14 18:30:43.009 | INFO     | tools.api:decode_vq_tokens:135 - VQ features: torch.Size([2, 173])
2024-05-14 18:30:43.012 | INFO     | tools.api:decode_vq_tokens:150 - Restored VQ features: torch.Size([1, 768, 692])
Traceback (most recent call last):
  File "/root/anaconda3/envs/fish-speech/lib/python3.10/site-packages/gradio/queueing.py", line 527, in process_events
    response = await route_utils.call_process_api(
  File "/root/anaconda3/envs/fish-speech/lib/python3.10/site-packages/gradio/route_utils.py", line 270, in call_process_api
    output = await app.get_blocks().process_api(
  File "/root/anaconda3/envs/fish-speech/lib/python3.10/site-packages/gradio/blocks.py", line 1897, in process_api
    data = await self.postprocess_data(fn_index, result["prediction"], state)
  File "/root/anaconda3/envs/fish-speech/lib/python3.10/site-packages/gradio/blocks.py", line 1673, in postprocess_data
    self.validate_outputs(fn_index, predictions)  # type: ignore
  File "/root/anaconda3/envs/fish-speech/lib/python3.10/site-packages/gradio/blocks.py", line 1649, in validate_outputs
    raise ValueError(
ValueError: An event handler (inference_wrapper) didn't receive enough output values (needed: 5, received: 4).
Wanted outputs:
    [<gradio.components.audio.Audio object at 0x7efee98aaa40>, <gradio.components.audio.Audio object at 0x7efee98a8b50>, <gradio.components.audio.Audio object at 0x7efee98a84c0>, <gradio.components.audio.Audio object at 0x7efee98a8a90>, <gradio.components.html.HTML object at 0x7efee98a8910>]
Received outputs:
    [None, <gradio.components.audio.Audio object at 0x7efe82c81300>, <gradio.components.audio.Audio object at 0x7effbea19450>, None]       

moyutegong avatar May 14 '24 10:05 moyutegong

image 我无法复现你说的问题,使用的参数和你的一致

AnyaCoder avatar May 14 '24 11:05 AnyaCoder

请检查是否满足编译条件: image image

AnyaCoder avatar May 14 '24 11:05 AnyaCoder

image image 已满足编译条件

moyutegong avatar May 14 '24 11:05 moyutegong

这看起来和 #194 是同一个问题, 我怀疑是多音频推理部分的异常处理有 bug.

leng-yue avatar May 14 '24 11:05 leng-yue

代码是最新的吗? 倒是在wsl2环境下发现流式音频没报错,但是不出声了。有空看看。

AnyaCoder avatar May 14 '24 11:05 AnyaCoder

代码是最新的

moyutegong avatar May 14 '24 11:05 moyutegong

image 生成5分钟的效果如上。我也不清楚了。但之后我们会找一个比较稳妥的方案。

AnyaCoder avatar May 14 '24 11:05 AnyaCoder

代码是最新的

尝试更新代码看看能不能修复,git pull

AnyaCoder avatar May 14 '24 12:05 AnyaCoder

我怀疑是文本切分的问题,我输入了只有句号和逗号的文本3000字左右能正常合成,这3000字基本都是短文本,然后输入之前失败的文本,发现会在一段大概200字的长文本失效,这段长文本有5 6个逗号并且没有换行,单独合成是可以的 我对对正常的文字进行处理之后合成日志如下:

2024-05-14 20:44:12.058 | INFO     | tools.api:encode_reference:112 - Encoded prompt: torch.Size([2, 205])
2024-05-14 20:44:12.353 | INFO     | tools.llama.generate:generate_long:509 - Encoded text: 许多痛苦,包含亲人朋友的许多不满,其原因只有一个,不在于人的年老,而
在于人的性格.
2024-05-14 20:44:12.353 | INFO     | tools.llama.generate:generate_long:509 - Encoded text: 假设他们是大大方方,心平气和的人,年老对他们称不上是太大的痛苦.    
2024-05-14 20:44:12.354 | INFO     | tools.llama.generate:generate_long:509 - Encoded text: 要不然的话,年轻轻的照样也少不了烦恼..一个好人,同时忍受贫困,老年, 固然不容易,可是坏人虽富,到了老年其内心也是得不到满足与宁静的.
2024-05-14 20:44:12.354 | INFO     | tools.llama.generate:generate_long:509 - Encoded text: .除了接受无知之罚外还能有什么别的吗?而受无知之罚显然就是我对有智 慧的人学习.
2024-05-14 20:44:12.355 | INFO     | tools.llama.generate:generate_long:509 - Encoded text: .善良的人便不肯为名为利来当官,他们不肯为了职务公开拿钱被人当仆人 看待,更不肯假公济私,暗中舞弊,被人当作小偷.
2024-05-14 20:44:12.355 | INFO     | tools.llama.generate:generate_long:509 - Encoded text: 名誉也不能动其心,原因是他们并没有野心,于是要他们愿意当官就只得用 惩罚来强制了,这就怪不得大家看不起那些没有受到强迫,就自己想要当官的人,可最大的惩罚还是你不去管人,却让比你坏的人来管你来了,我想象,好人怕这个惩罚,因此勉强出来, 他们不是为了自己的荣华富贵,而是迫不得已,实在找不到比他们更好的或同样好的人来担负这个责任,假设全国都是好人,大家会争着不当官,象现在争着大家要当官一样热烈,那时 候才会看得出来,一位真正的治国者追求的不是他自己的利益,而是老百姓的利益,所以有识之士宁可受人之惠,也不愿多管闲事加惠于人.
2024-05-14 20:44:12.356 | INFO     | tools.llama.generate:generate_long:509 - Encoded text: .心灵不也是这样的吗?最勇敢,最智慧的心灵最不容易被任何外界的影响所干扰或者改变了.
2024-05-14 20:44:12.357 | INFO     | tools.llama.generate:generate_long:509 - Encoded text: 万事万物那么都是这样的了. --任何事物处于最好状况之下,'不论是天然 的状况最好,还是人为的状况最好,或两种状况都最好',是最不容易被别的东西所改变的.
2024-05-14 20:44:12.358 | INFO     | tools.llama.generate:generate_long:509 - Encoded text: .一位儿童从小受了好的教育,节奏与和谐浸入了他的心灵深处,在那里牢牢地生了根,他便会变得温文有礼,如果受了坏的教育,结果就会相反.
2024-05-14 20:44:12.358 | INFO     | tools.llama.generate:generate_long:509 - Encoded text: 再者,一个受过适宜教育的儿童,对于人工作品或者自然物的缺点也最敏感,因而对丑恶的东西会非常反感,对优美的东西会非常赞赏,感受其鼓舞,并且从中吸取营养,使自己的心灵成长得既美且善.
2024-05-14 20:44:12.358 | INFO     | tools.llama.generate:generate_long:509 - Encoded text: 对任何丑恶的东西,他能象嫌恶臭不自觉地加以谴责,虽他还年幼,还知其然而不知其所以然.
2024-05-14 20:44:12.359 | INFO     | tools.llama.generate:generate_long:509 - Encoded text: 等到长大成人,理智来临,他会似曾相识,向前欢迎,由于他所受的教养,令他同气相求,这是十分自然的嘛.
2024-05-14 20:44:12.359 | INFO     | tools.llama.generate:generate_long:527 - Generating sentence 1/12 of sample 1/1
 12%|██████████████▎                                                                                                      | 195/1588 [00:09<01:09, 19.99it/s]
2024-05-14 20:44:22.429 | INFO     | tools.llama.generate:generate_long:581 - Generated 197 tokens in 10.07 seconds, 19.56 tokens/sec
2024-05-14 20:44:22.429 | INFO     | tools.llama.generate:generate_long:584 - Bandwidth achieved: 20.08 GB/s
2024-05-14 20:44:22.430 | INFO     | tools.llama.generate:generate_long:589 - GPU Memory used: 6.28 GB
2024-05-14 20:44:22.430 | INFO     | tools.llama.generate:generate_long:527 - Generating sentence 2/12 of sample 1/1
2024-05-14 20:44:22.431 | INFO     | tools.api:decode_vq_tokens:135 - VQ features: torch.Size([2, 195])
2024-05-14 20:44:22.452 | INFO     | tools.api:decode_vq_tokens:150 - Restored VQ features: torch.Size([1, 768, 780])
 13%|███████████████▏                                                                                                     | 166/1283 [00:07<00:53, 21.05it/s]
2024-05-14 20:44:30.922 | INFO     | tools.llama.generate:generate_long:581 - Generated 168 tokens in 8.49 seconds, 19.78 tokens/sec
2024-05-14 20:44:30.923 | INFO     | tools.llama.generate:generate_long:584 - Bandwidth achieved: 20.31 GB/s
2024-05-14 20:44:30.923 | INFO     | tools.llama.generate:generate_long:589 - GPU Memory used: 6.28 GB
2024-05-14 20:44:30.923 | INFO     | tools.llama.generate:generate_long:527 - Generating sentence 3/12 of sample 1/1
2024-05-14 20:44:30.924 | INFO     | tools.api:decode_vq_tokens:135 - VQ features: torch.Size([2, 166])
2024-05-14 20:44:30.947 | INFO     | tools.api:decode_vq_tokens:150 - Restored VQ features: torch.Size([1, 768, 664])
 39%|█████████████████████████████████████████████▋                                                                        | 354/914 [00:17<00:27, 20.25it/s]
2024-05-14 20:44:49.160 | INFO     | tools.llama.generate:generate_long:581 - Generated 356 tokens in 18.24 seconds, 19.52 tokens/sec
2024-05-14 20:44:49.160 | INFO     | tools.llama.generate:generate_long:584 - Bandwidth achieved: 20.04 GB/s
2024-05-14 20:44:49.160 | INFO     | tools.llama.generate:generate_long:589 - GPU Memory used: 6.28 GB
2024-05-14 20:44:49.161 | INFO     | tools.llama.generate:generate_long:527 - Generating sentence 4/12 of sample 1/1
2024-05-14 20:44:49.162 | INFO     | tools.api:decode_vq_tokens:135 - VQ features: torch.Size([2, 354])
2024-05-14 20:44:49.181 | INFO     | tools.api:decode_vq_tokens:150 - Restored VQ features: torch.Size([1, 768, 1416])
 29%|█████████████████████████████████▋                                                                                    | 201/705 [00:09<00:25, 20.12it/s]
2024-05-14 20:45:00.162 | INFO     | tools.llama.generate:generate_long:581 - Generated 203 tokens in 11.00 seconds, 18.45 tokens/sec
2024-05-14 20:45:00.162 | INFO     | tools.llama.generate:generate_long:584 - Bandwidth achieved: 18.94 GB/s
2024-05-14 20:45:00.162 | INFO     | tools.llama.generate:generate_long:589 - GPU Memory used: 6.28 GB
2024-05-14 20:45:00.163 | INFO     | tools.llama.generate:generate_long:527 - Generating sentence 5/12 of sample 1/1
2024-05-14 20:45:00.164 | INFO     | tools.api:decode_vq_tokens:135 - VQ features: torch.Size([2, 201])
2024-05-14 20:45:00.185 | INFO     | tools.api:decode_vq_tokens:150 - Restored VQ features: torch.Size([1, 768, 804])
 29%|██████████████████████████████████▊                                                                                   | 261/885 [00:12<00:30, 20.22it/s]
2024-05-14 20:45:13.880 | INFO     | tools.llama.generate:generate_long:581 - Generated 263 tokens in 13.72 seconds, 19.17 tokens/sec
2024-05-14 20:45:13.881 | INFO     | tools.llama.generate:generate_long:584 - Bandwidth achieved: 19.68 GB/s
2024-05-14 20:45:13.881 | INFO     | tools.llama.generate:generate_long:589 - GPU Memory used: 6.28 GB
2024-05-14 20:45:13.881 | INFO     | tools.llama.generate:generate_long:527 - Generating sentence 6/12 of sample 1/1
2024-05-14 20:45:13.882 | INFO     | tools.api:decode_vq_tokens:135 - VQ features: torch.Size([2, 261])
2024-05-14 20:45:13.901 | INFO     | tools.api:decode_vq_tokens:150 - Restored VQ features: torch.Size([1, 768, 1044])

测试文字如下:

许多痛苦,包含亲人朋友的许多不满,其原因只有一个,不在于人的年老,而在于人的性格。 假设他们是大大方方,心平气和的人,年老对他们称不上是太大的痛苦。要不然的话,年轻轻的照样也少不了烦恼。
一个好人,同时忍受贫困、老年,固然不容易,可是坏人虽富,到了老年其内心也是得不到满足与宁静的。  
除了接受无知之罚外还能有什么别的吗?而受无知之罚显然就是我对有智慧的人学习。
善良的人便不肯为名为利来当官,他们不肯为了职务公开拿钱被人当仆人看待,更不肯假公济私,暗中舞弊,被人当作小偷。 名誉也不能动其心,原因是他们并没有野心,于是要他们愿意当官就只得用惩罚来强制了,这就怪不得大家看不起那些没有受到强迫,就自己想要当官的人,可最大的惩罚还是你不去管人,却让比你坏的人来管你来了,我想象,好人怕这个惩罚,因此勉强出来,他们不是为了自己的荣华富贵,而是迫不得已,实在找不到比他们更好的或同样好的人来担负这个责任,假设全国都是好人,大家会争着不当官,象现在争着大家要当官一样热烈,那时候才会看得出来,一位真正的治国者追求的不是他自己的利益,而是老百姓的利益,所以有识之士宁可受人之惠,也不愿多管闲事加惠于人。 
心灵不也是这样的吗?最勇敢、最智慧的心灵最不容易被任何外界的影响所干扰或者改变了。万事万物那么都是这样的了。 ——任何事物处于最好状况之下,(不论是天然的状况最好,还是人为的状况最好,或两种状况都最好),是最不容易被别的东西所改变的。
一位儿童从小受了好的教育,节奏与和谐浸入了他的心灵深处,在那里牢牢地生了根,他便会变得温文有礼;如果受了坏的教育,结果就会相反。 再者,一个受过适宜教育的儿童,对于人工作品或者自然物的缺点也最敏感,因而对丑恶的东西会非常反感,对优美的东西会非常赞赏,感受其鼓舞,并且从中吸取营养,使自己的心灵成长得既美且善。 对任何丑恶的东西,他能象嫌恶臭不自觉地加以谴责,虽他还年幼,还知其然而不知其所以然。 等到长大成人,理智来临,他会似曾相识,向前欢迎,由于他所受的教养,令他同气相求,这是十分自然的嘛。

失败截图: image 合成成功的结果: 1715690792831

moyutegong avatar May 14 '24 12:05 moyutegong

2DDBR{@@XBW~3$@W L@DLM2 image

找到问题所在了,我们不能让句子编码长度大于2048 tokens长度。而你这个超过了限制,达到了2840tokens。

AnyaCoder avatar May 14 '24 14:05 AnyaCoder

不过勾选Compile Model为YES后还是推理不了,我重新创建了conda环境然后重新按照依赖还是无法使用编译

moyutegong avatar May 14 '24 14:05 moyutegong

我好像发现问题所在了,使用webUI进行推理时半精度默认没有打开,导致20系显卡无法进行编译,将--half设置默认为True后推理速度提升了5倍多,建议加上一个按钮用来开关半精度 编译前: 1715701084145 编译后: 1715701084148

1715701170924

moyutegong avatar May 14 '24 15:05 moyutegong