RuntimeError: The size of tensor a (5002) must match the size of tensor b (2) at non-singleton dimension 3
Describe the bug
当使用零样本克隆声音的时候,在运行如下命令时:
output = cosyvoice.inference_zero_shot(temp_text, sound_clone_text, prompt_speech_16k)
经常会出现这样的报错信息:RuntimeError: The size of tensor a (5002) must match the size of tensor b (2) at non-singleton dimension 3 该如何解决
错误信息如下
min value is tensor(-1.0073)
max value is tensor(1.0222)
min value is tensor(-1.0073)
max value is tensor(1.0222)
Traceback (most recent call last):
File "/home/jupyter/ollama_models/blob/mm/mytools240621/tool/cosyvoice1.py", line 53, in ttt3
output = cosyvoice.inference_zero_shot(temp_text, sound_clone_text, prompt_speech_16k)
File "/home/jupyter/ollama_models/blob/mm/CosyVoice/cosyvoice/cli/cosyvoice.py", line 60, in inference_zero_shot
model_output = self.model.inference(**model_input)
File "/home/jupyter/ollama_models/blob/mm/CosyVoice/cosyvoice/cli/model.py", line 40, in inference
tts_speech_token = self.llm.inference(text=text.to(self.device),
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jupyter/ollama_models/blob/mm/CosyVoice/cosyvoice/llm/llm.py", line 196, in inference
y_pred, att_cache, cnn_cache = self.llm.forward_chunk(lm_input, offset=0, required_cache_size=-1, att_cache=att_cache, cnn_cache=cnn_cache,
File "/home/jupyter/ollama_models/blob/mm/CosyVoice/cosyvoice/transformer/encoder.py", line 251, in forward_chunk
xs, _, new_att_cache, new_cnn_cache = layer(
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jupyter/ollama_models/blob/mm/CosyVoice/cosyvoice/transformer/encoder_layer.py", line 93, in forward
x_att, new_att_cache = self.self_attn(x, x, x, mask, pos_emb=pos_emb, cache=att_cache)
File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jupyter/ollama_models/blob/mm/CosyVoice/cosyvoice/transformer/attention.py", line 323, in forward
scores = (matrix_ac + matrix_bd) / math.sqrt(
RuntimeError: The size of tensor a (5002) must match the size of tensor b (2) at non-singleton dimension 3
Desktop (please complete the following information):
- OS: Ubuntu
have u solved this problem? i encounter it also
i encounter it also.GPU:4090D
I have also encountered this problem today.
Can someone help solve it?
同样问题,有解决方案吗?
me too!!!
To debug try disable codegen fallback path via setting the env variable export PYTORCH_NVFUSER_DISABLE=fallback
(Triggered internally at ../third_party/nvfuser/csrc/manager.cpp:335.)
y_pred, att_cache, cnn_cache = self.llm.forward_chunk(lm_input, offset=0, required_cache_size=-1, att_cache=att_cache, cnn_cache=cnn_cache,
Exception in thread Thread-12:
Traceback (most recent call last):
File "/root/anaconda3/envs/cosyvoice/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/root/anaconda3/envs/cosyvoice/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/app/cosyvoice/cli/model.py", line 73, in llm_job
for i in self.llm.inference(text=text.to(self.device),
File "/root/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 56, in generator_context
response = gen.send(request)
File "/app/cosyvoice/llm/llm.py", line 197, in inference
y_pred, att_cache, cnn_cache = self.llm.forward_chunk(lm_input, offset=0, required_cache_size=-1, att_cache=att_cache, cnn_cache=cnn_cache,
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/cosyvoice/transformer/encoder/___torch_mangle_25.py", line 1137, in fallback_cuda_fuser
else:
matrix_bd41 = matrix_bd
scores = torch.div(torch.add(matrix_ac, matrix_bd41), 8.)
~~~~~~~~~ <--- HERE
n_batch27 = torch.size(v28, 0)
_442 = torch.gt(torch.size(att_mask, 2), 0)
Traceback of TorchScript, original code (most recent call last): File "/mnt/lyuxiang.lx/CosyVoice_github/cosyvoice/bin/../../cosyvoice/transformer/attention.py", line 327, in fallback_cuda_fuser matrix_bd = self.rel_shift(matrix_bd)
scores = (matrix_ac + matrix_bd) / math.sqrt(
~~~~~~~~~~~~~~~~~~~~~ <--- HERE
self.d_k) # (batch, head, time1, time2)
RuntimeError: The size of tensor a (5002) must match the size of tensor b (2) at non-singleton dimension 3
me too
注意切割语句,不要一段文字里全是逗号推送过去, 要有明显的停顿。 例如 句号, 感叹号之类的, 基本上能避免这个问题
同样问题,有解决方案吗?
同样的问题,有fix嘛?