使用3s极速复刻时报错
我是用官方文档的命令启动webui
python3 webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M
启动后界面可以正常打开,如下
上面是我的配置,点击运行后提示错误,终端输出如下:
2024-10-28 11:15:36,546 INFO get zero_shot inference request
0%| | 0/1 [00:00<?, ?it/s]2024-10-28 11:15:55,785 INFO synthesis text 我是通义实验室语音团队全新推出的生成式语音大模型,提供舒适自然的语音合成能力。
Exception in thread Thread-7:
Traceback (most recent call last):
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\threading.py", line 932, in _bootstrap_inner
self.run()
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "D:\projects\ai\CosyVoice\cosyvoice\cli\model.py", line 93, in llm_job
for i in self.llm.inference(text=text.to(self.device),
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\torch\utils\_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "D:\projects\ai\CosyVoice\cosyvoice\llm\llm.py", line 172, in inference
text, text_len = self.encode(text, text_len)
File "D:\projects\ai\CosyVoice\cosyvoice\llm\llm.py", line 75, in encode
encoder_out, encoder_mask = self.text_encoder(text, text_lengths, decoding_chunk_size=1, num_decoding_left_chunks=-1)
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__/cosyvoice/transformer/encoder/___torch_mangle_5.py", line 22, in forward
masks = torch.bitwise_not(torch.unsqueeze(mask, 1))
embed = self.embed
_0 = torch.add(torch.matmul(xs, CONSTANTS.c0), CONSTANTS.c1)
~~~~~~~~~~~~ <--- HERE
input = torch.layer_norm(_0, [1024], CONSTANTS.c2, CONSTANTS.c3)
pos_enc = embed.pos_enc
Traceback of TorchScript, original code (most recent call last):
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
0%| | 0/1 [00:19<?, ?it/s]
Traceback (most recent call last):
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\gradio\queueing.py", line 521, in process_events
response = await route_utils.call_process_api(
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\gradio\blocks.py", line 1945, in process_api
result = await self.call_function(
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\gradio\blocks.py", line 1525, in call_function
prediction = await utils.async_iteration(iterator)
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\gradio\utils.py", line 655, in async_iteration
return await iterator.__anext__()
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\gradio\utils.py", line 648, in __anext__
return await anyio.to_thread.run_sync(
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\anyio\_backends\_asyncio.py", line 2364, in run_sync_in_worker_thread
return await future
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\anyio\_backends\_asyncio.py", line 864, in run
result = context.run(func, *args)
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\gradio\utils.py", line 631, in run_sync_iterator_async
return next(iterator)
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\gradio\utils.py", line 814, in gen_wrapper
response = next(iterator)
File "webui.py", line 120, in generate_audio
for i in cosyvoice.inference_zero_shot(tts_text, prompt_text, prompt_speech_16k, stream=stream, speed=speed):
File "D:\projects\ai\CosyVoice\cosyvoice\cli\cosyvoice.py", line 73, in inference_zero_shot
for model_output in self.model.tts(**model_input, stream=stream, speed=speed):
File "D:\projects\ai\CosyVoice\cosyvoice\cli\model.py", line 191, in tts
this_tts_speech = self.token2wav(token=this_tts_speech_token,
File "D:\projects\ai\CosyVoice\cosyvoice\cli\model.py", line 104, in token2wav
tts_mel, flow_cache = self.flow.inference(token=token.to(self.device),
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\projects\ai\CosyVoice\cosyvoice\flow\flow.py", line 123, in inference
token = self.input_embedding(torch.clamp(token, min=0)) * mask
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\torch\nn\modules\sparse.py", line 162, in forward
return F.embedding(
File "D:\ProgramData\miniconda3\envs\cosyvoice\lib\site-packages\torch\nn\functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
感谢钉钉群@铝箱-CosyVoice开发者的解答,在webui.py的第184行把初始化代码改为下面这样就可以运行了。
cosyvoice = CosyVoice(args.model_dir,load_jit=False, fp16=False)
但是速度奇慢,点击生成音频用了300秒,我的电脑配置也不算很低,有什么办法改进速度吗?
显卡是3050 4gb显存
感谢钉钉群@铝箱-CosyVoice开发者的解答,在webui.py的第184行把初始化代码改为下面这样就可以运行了。
cosyvoice = CosyVoice(args.model_dir,load_jit=False, fp16=False)但是速度奇慢,点击生成音频用了300秒,我的电脑配置也不算很低,有什么办法改进速度吗?显卡是3050 4gb显存
应该是你gpu没用上,看报错是你这边不支持half
This issue is stale because it has been open for 30 days with no activity.
我碰到这个问题,后来改 2.0-0.5B,然后发现复刻音色要在 Prompt 里把录入的语音的文字输进去才行。
显卡是3050 4gb显存