FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

推理时RuntimeError: Expected 2D or 3D (batch mode) tensor with possibly 0 batch size and other non-zero dimensions for input, but got: [1, 0, 0]

Open sethws opened this issue 1 year ago • 0 comments

raceback (most recent call last): | 0/4 [00:00<?, ?it/s] File "/root/workspace/FunASR/examples/industrial_data_pretraining/sense_voice/deno2.py", line 28, in res = model.generate( ^^^^^^^^^^^^^^^ File "/usr/bin/anaconda3/envs/Whisper-Finetune/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 263, in generate return self.inference_with_vad(input, input_len=input_len, **cfg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/bin/anaconda3/envs/Whisper-Finetune/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 417, in inference_with_vad results = self.inference( ^^^^^^^^^^^^^^^ File "/usr/bin/anaconda3/envs/Whisper-Finetune/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 302, in inference res = model.inference(**batch, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/bin/anaconda3/envs/Whisper-Finetune/lib/python3.11/site-packages/funasr/models/sense_voice/model.py", line 832, in inference speech, speech_lengths = extract_fbank( ^^^^^^^^^^^^^^ File "/usr/bin/anaconda3/envs/Whisper-Finetune/lib/python3.11/site-packages/funasr/utils/load_utils.py", line 173, in extract_fbank data, data_len = frontend(data, data_len, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/bin/anaconda3/envs/Whisper-Finetune/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/bin/anaconda3/envs/Whisper-Finetune/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/bin/anaconda3/envs/Whisper-Finetune/lib/python3.11/site-packages/funasr/frontends/wav_frontend.py", line 134, in forward mat = kaldi.fbank( ^^^^^^^^^^^^ File "/usr/bin/anaconda3/envs/Whisper-Finetune/lib/python3.11/site-packages/torchaudio/compliance/kaldi.py", line 600, in fbank strided_input, signal_log_energy = _get_window( ^^^^^^^^^^^^ File "/usr/bin/anaconda3/envs/Whisper-Finetune/lib/python3.11/site-packages/torchaudio/compliance/kaldi.py", line 195, in _get_window offset_strided_input = torch.nn.functional.pad(strided_input.unsqueeze(0), (1, 0), mode="replicate").squeeze( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Expected 2D or 3D (batch mode) tensor with possibly 0 batch size and other non-zero dimensions for input, but got: [1, 0, 0] 请问怎么解决呢,一部分音频可以,一部分报错

sethws avatar Jul 31 '24 09:07 sethws