FunASR
FunASR copied to clipboard
Pytorch 张量问题
❓ Questions and Help
What is your question?
我的问题是,在使用过程中,突然报错了: Sizes of tensors must match except in dimension 1. Expected size 2 but got size 1 for tensor number 1 in the list
Code
` model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", # spk_model="cam++" )
res = model.generate(input=output_path, batch_size_s=300, hotword='魔搭') print(res) `
What have you tried?
使用过一段时间,一开始可以正常识别出语音,但是突然报错,未重启脚本,因为要排查这个问题。 尝试过之前可识别的音频,重新调用识别,但是一样报错,网上查过说是:Pytorch 张量问题。但是这个在我自己的代码中并没有体现。
What's your environment?
- OS: win10
- FunASR Version: 1.0.25
- ModelScope Version : 1.14.0
- PyTorch Version: pytorch-wpe 版本0.0.1
- How you installed funasr (
pip, source): 使用pip install 直接安装 - Python version: 3.11.7 使用的是cpu,未使用gpu
麻烦大佬帮忙看下~
Please show detail logs of error. Upload the wav file.
I got the same issue here, when using the cantonese model, here is the full log @LauraGPT :
Sizes of tensors must match except in dimension 2. Expected size 1 but got size 2 for tensor number 1 in the list.
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/scama/decoder.py", line 457, in forward_one_step
x = torch.cat((x, pre_acoustic_embeds), dim=-1)
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/scama/decoder.py", line 419, in score
logp, state = self.forward_one_step(
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/beam_search.py", line 176, in score_full
scores[k], states[k] = d.score(
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/beam_search.py", line 309, in search
scores, states = self.score_full(
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/beam_search.py", line 410, in forward
best = self.search(
File "/home/user/miniconda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/miniconda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/models/uniasr/model.py", line 996, in inference
nbest_hyps = self.beam_search(
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 285, in inference
res = model.inference(**batch, **kwargs)
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 394, in inference_with_vad
results = self.inference(
File "/home/user/miniconda/lib/python3.9/site-packages/funasr/auto/auto_model.py", line 248, in generate
return self.inference_with_vad(input, input_len=input_len, **cfg)
File "/home/user/miniconda/lib/python3.9/site-packages/modelscope/models/audio/funasr/model.py", line 61, in forward
output = self.model.generate(*args, **kwargs)
File "/home/user/miniconda/lib/python3.9/site-packages/modelscope/models/base/base_model.py", line 35, in __call__
return self.postprocess(self.forward(*args, **kwargs))
File "/home/user/miniconda/lib/python3.9/site-packages/modelscope/pipelines/audio/funasr_pipeline.py", line 73, in __call__
output = self.model(*args, **kwargs)
File "/data/tts/sovits/GPT-SoVITS/tools/asr/funasr_cantonese.py", line 35, in <module>
rec_result = inference_pipeline(input="/data/tts/sovits/audio_res/e1/12_4.wav")
File "/home/user/miniconda/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/user/miniconda/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 1 but got size 2 for tensor number 1 in the list.
Code I used:
from funasr import AutoModel
path_asr = "iic/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online"
path_vad = "iic/speech_fsmn_vad_zh-cn-16k-common-pytorch"
path_punc = "iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"
model = AutoModel(
model = path_asr,
vad_model = path_vad,
vad_model_revision = "v2.0.4",
punc_model = path_punc,
punc_model_revision = "v2.0.4",
)
res = model.generate(
input="/data/tts/sovits/audio_res/e1/12_4.wav" # Failed
# input="/data/tts/sovits/audio_res/e1/12_12.wav" # Success
)
print(res)
Here is the audio file I used: Desktop.zip
The audio file which failed in my code, can be successfully processed in the online demo of modelscope: https://www.modelscope.cn/models/iic/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online/summary. Maybe a recent update breaked some functionality?
After some digging, I found that the uniasr seems not able to handle the case where batch size > 1. When vad model is enabled and it split the audio to pieces, the error is triggered. A temporary solution is disable the vad model.
@kexul 你的意思是这是vad模型的问题?不使用vad就行?
@kexul 你的意思是这是vad模型的问题?不使用vad就行?
嗯,我这边把vad关掉,就都可以跑了,你可以试试看~
@kexul 多谢,我试试
@linrb685 If you still want vad and punct, you can do them manually 🤣:
import soundfile
from pathlib import Path
from funasr import AutoModel
path_asr = "iic/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online"
path_vad = "iic/speech_fsmn_vad_zh-cn-16k-common-pytorch"
path_punc = "iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch"
model = AutoModel(model=path_asr)
vad_model = AutoModel(model=path_vad)
punc_model = AutoModel(model=path_punc)
for item in Path('.').glob('*.wav'):
print(str(item))
text = model.generate(input=str(item))[0]['text']
print(text)
res_vad = vad_model.generate(input=str(item))[0]['value']
wav, sr = soundfile.read(str(item))
full_text = []
for span in res_vad:
wav_span = wav[int(span[0]*sr/1000):int(span[1]*sr/1000)]
wav_temp = soundfile.write('temp.wav', wav_span, sr)
text = model.generate(input='temp.wav')[0]['text']
full_text.append(text)
full_text = ' '.join(full_text)
punc_text = punc_model.generate(input=full_text)[0]['text']
print(punc_text)
@kexul 多谢,vad对我们不是必须的。但是可以考虑加上。目前没遇到,需要多测试一下