FunASR
FunASR copied to clipboard
SenseVoice When `use_itn=False` raise `IndexError: index 2 is out of bounds for dimension 1 with size 2`
🐛 Bug
raceback (most recent call last): | 0/1 [00:00<?, ?it/s]
File "/home/xnne/code/XnneHangLab/.venv/bin/test", line 10, in <module>
sys.exit(main())
^^^^^^
File "/home/xnne/code/XnneHangLab/src/uiya/test.py", line 101, in main
sense_text,sense_timestamp = generate_sense_voice_results(model=model, input_path=Path("./tests/数字人生.wav"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xnne/code/XnneHangLab/src/uiya/utils/model.py", line 96, in generate_sense_voice_results
res = model.generate(
^^^^^^^^^^^^^^^
File "/home/xnne/code/XnneHangLab/.venv/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 306, in generate
return self.inference_with_vad(input, input_len=input_len, **cfg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xnne/code/XnneHangLab/.venv/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 464, in inference_with_vad
results = self.inference(
^^^^^^^^^^^^^^^
File "/home/xnne/code/XnneHangLab/.venv/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 345, in inference
res = model.inference(**batch, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xnne/code/XnneHangLab/.venv/lib/python3.11/site-packages/funasr/models/sense_voice/model.py", line 925, in inference
align = ctc_forced_align(
^^^^^^^^^^^^^^^^^
File "/home/xnne/code/XnneHangLab/.venv/lib/python3.11/site-packages/funasr/models/sense_voice/utils/ctc_alignment.py", line 45, in ctc_forced_align
best_score[:, padding_num + 0] = log_probs[:, 0, blank]
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
IndexError: index 2 is out of bounds for dimension 1 with size 2
0%| | 0/1 [00:00<?, ?it/s]
0%|
To Reproduce
use sample 数字人生.wav
https://pan.baidu.com/s/1wkVsOyrWxhhywIryrZN4fA?pwd=dby6
or from here:
https://github.com/MrXnneHang/Subtitle-Generator-Examples/blob/dev/example4.wav
Code sample
model = AutoModel(
model=self.settings.sense_voice_model,
vad_model=self.settings.vad_model, # vad 是用于音频分段的
vad_kwargs={"max_single_segment_time": 30000},
device=self.device,
disable_update=True,
)
res = model.generate(
input=str(input_path), # 1分钟以上长音频
cache={},
language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
use_itn=False,
batch_size_s=60,
output_timestamp=True #修复前 当同时开启vad和输出时间戳时model.py中会报错
)
Expected behavior
Environment
====== Version =======
funasr:1.2.6
torch:2.2.0+cpu
torchaudio:2.2.0+cpu
............ xnne@xnne-PC
.';;;;;. .,;,. ------------
.,;;;;;;;. ';;;;;;;. OS: Deepin 23 x86_64
.;::::::::' .,::;;,''''',. Host: OMEN by HP Laptop 16-b1xxx
,'.:::::::: .;;'. '; Kernel: 6.6.59-amd64-desktop-hwe
;' 'cccccc, ,' :: '.. .: Uptime: 13 hours, 13 mins
,, :ccccc. ;: .c, '' :. ,; Packages: 2324 (dpkg)
.l. cllll' ., .lc :; .l' l. Shell: bash 5.2.21
.c :lllc ;cl: .l' .ll. :' Resolution: 1920x1080, 1920x1080
.l 'looc. . ,o: 'oo' c, DE: DDE
.o. .:ool::coc' .ooo' o. WM: KWin
:: ..... .;dddo ;c Theme: deepin-dark [GTK2], Adwaita [GTK3]
l:... .';lddddo. ,o Icons: flow [GTK2], Adwaita [GTK3]
lxxxxxdoolllodxxxxxxxxxc :l Terminal: deepin-terminal
,dxxxxxxxxxxxxxxxxxxl. 'o, CPU: 12th Gen Intel i7-12700H (20) @ 4.600GHz
,dkkkkkkkkkkkkko;. .;o; GPU: NVIDIA GeForce RTX 3060 Mobile / Max-Q
.;okkkkkdl;. .,cl:. GPU: Intel Alder Lake-P GT2 [Iris Xe Graphics]
.,:cccccccc:,. Memory: 7531MiB / 15650MiB