CosyVoice RuntimeError: The size of tensor a (5002) must match the size of tensor b (2) at non-singleton dimension 3

Describe the bug 当使用零样本克隆声音的时候，在运行如下命令时： output = cosyvoice.inference_zero_shot(temp_text, sound_clone_text, prompt_speech_16k) 经常会出现这样的报错信息：RuntimeError: The size of tensor a (5002) must match the size of tensor b (2) at non-singleton dimension 3 该如何解决

错误信息如下

min value is  tensor(-1.0073)
max value is  tensor(1.0222)
min value is  tensor(-1.0073)
max value is  tensor(1.0222)
Traceback (most recent call last):
  File "/home/jupyter/ollama_models/blob/mm/mytools240621/tool/cosyvoice1.py", line 53, in ttt3
    output = cosyvoice.inference_zero_shot(temp_text, sound_clone_text, prompt_speech_16k)
  File "/home/jupyter/ollama_models/blob/mm/CosyVoice/cosyvoice/cli/cosyvoice.py", line 60, in inference_zero_shot
    model_output = self.model.inference(**model_input)
  File "/home/jupyter/ollama_models/blob/mm/CosyVoice/cosyvoice/cli/model.py", line 40, in inference
    tts_speech_token = self.llm.inference(text=text.to(self.device),
  File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/jupyter/ollama_models/blob/mm/CosyVoice/cosyvoice/llm/llm.py", line 196, in inference
    y_pred, att_cache, cnn_cache = self.llm.forward_chunk(lm_input, offset=0, required_cache_size=-1, att_cache=att_cache, cnn_cache=cnn_cache,
  File "/home/jupyter/ollama_models/blob/mm/CosyVoice/cosyvoice/transformer/encoder.py", line 251, in forward_chunk
    xs, _, new_att_cache, new_cnn_cache = layer(
  File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jupyter/ollama_models/blob/mm/CosyVoice/cosyvoice/transformer/encoder_layer.py", line 93, in forward
    x_att, new_att_cache = self.self_attn(x, x, x, mask, pos_emb=pos_emb, cache=att_cache)
  File "/opt/conda/envs/cosyvoice/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jupyter/ollama_models/blob/mm/CosyVoice/cosyvoice/transformer/attention.py", line 323, in forward
    scores = (matrix_ac + matrix_bd) / math.sqrt(
RuntimeError: The size of tensor a (5002) must match the size of tensor b (2) at non-singleton dimension 3

Desktop (please complete the following information):

OS: Ubuntu

Aug 02 '24 11:08 mayi140611

have u solved this problem? i encounter it also

Aug 06 '24 15:08 nickyi1990

i encounter it also.GPU:4090D

Aug 07 '24 09:08 jimling

I have also encountered this problem today.

Aug 08 '24 14:08 Btlmd

Can someone help solve it?

Aug 13 '24 03:08 mayi140611

同样问题,有解决方案吗?

Sep 26 '24 04:09 abo123456789

me too！！！

To debug try disable codegen fallback path via setting the env variable export PYTORCH_NVFUSER_DISABLE=fallback (Triggered internally at ../third_party/nvfuser/csrc/manager.cpp:335.) y_pred, att_cache, cnn_cache = self.llm.forward_chunk(lm_input, offset=0, required_cache_size=-1, att_cache=att_cache, cnn_cache=cnn_cache, Exception in thread Thread-12: Traceback (most recent call last): File "/root/anaconda3/envs/cosyvoice/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/root/anaconda3/envs/cosyvoice/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/app/cosyvoice/cli/model.py", line 73, in llm_job for i in self.llm.inference(text=text.to(self.device), File "/root/anaconda3/envs/cosyvoice/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 56, in generator_context response = gen.send(request) File "/app/cosyvoice/llm/llm.py", line 197, in inference y_pred, att_cache, cnn_cache = self.llm.forward_chunk(lm_input, offset=0, required_cache_size=-1, att_cache=att_cache, cnn_cache=cnn_cache, RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/cosyvoice/transformer/encoder/___torch_mangle_25.py", line 1137, in fallback_cuda_fuser else: matrix_bd41 = matrix_bd scores = torch.div(torch.add(matrix_ac, matrix_bd41), 8.) ~~~~~~~~~ <--- HERE n_batch27 = torch.size(v28, 0) _442 = torch.gt(torch.size(att_mask, 2), 0)

Traceback of TorchScript, original code (most recent call last): File "/mnt/lyuxiang.lx/CosyVoice_github/cosyvoice/bin/../../cosyvoice/transformer/attention.py", line 327, in fallback_cuda_fuser matrix_bd = self.rel_shift(matrix_bd)

     scores = (matrix_ac + matrix_bd) / math.sqrt(
               ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
         self.d_k)  # (batch, head, time1, time2)

RuntimeError: The size of tensor a (5002) must match the size of tensor b (2) at non-singleton dimension 3

Oct 16 '24 06:10 zhoufengen

me too

Dec 05 '24 08:12 gaspire

注意切割语句，不要一段文字里全是逗号推送过去，要有明显的停顿。例如句号，感叹号之类的，基本上能避免这个问题

Dec 12 '24 07:12 panjie-payne

同样问题,有解决方案吗?

Dec 18 '24 07:12 luhairong11

同样的问题，有fix嘛？

May 21 '25 09:05 gillbates