FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

SenseVoice When `use_itn=False` raise `IndexError: index 2 is out of bounds for dimension 1 with size 2`

Open MrXnneHang opened this issue 7 months ago • 0 comments

🐛 Bug

raceback (most recent call last):                                                                                                                                     | 0/1 [00:00<?, ?it/s]
  File "/home/xnne/code/XnneHangLab/.venv/bin/test", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/xnne/code/XnneHangLab/src/uiya/test.py", line 101, in main
    sense_text,sense_timestamp = generate_sense_voice_results(model=model, input_path=Path("./tests/数字人生.wav"))
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xnne/code/XnneHangLab/src/uiya/utils/model.py", line 96, in generate_sense_voice_results
    res = model.generate(
          ^^^^^^^^^^^^^^^
  File "/home/xnne/code/XnneHangLab/.venv/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 306, in generate
    return self.inference_with_vad(input, input_len=input_len, **cfg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xnne/code/XnneHangLab/.venv/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 464, in inference_with_vad
    results = self.inference(
              ^^^^^^^^^^^^^^^
  File "/home/xnne/code/XnneHangLab/.venv/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 345, in inference
    res = model.inference(**batch, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xnne/code/XnneHangLab/.venv/lib/python3.11/site-packages/funasr/models/sense_voice/model.py", line 925, in inference
    align = ctc_forced_align(
            ^^^^^^^^^^^^^^^^^
  File "/home/xnne/code/XnneHangLab/.venv/lib/python3.11/site-packages/funasr/models/sense_voice/utils/ctc_alignment.py", line 45, in ctc_forced_align
    best_score[:, padding_num + 0] = log_probs[:, 0, blank]
    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
IndexError: index 2 is out of bounds for dimension 1 with size 2
  0%|                                                                                                                                                                 | 0/1 [00:00<?, ?it/s]
  0%|   

To Reproduce

use sample 数字人生.wav

https://pan.baidu.com/s/1wkVsOyrWxhhywIryrZN4fA?pwd=dby6

or from here:

https://github.com/MrXnneHang/Subtitle-Generator-Examples/blob/dev/example4.wav

Code sample

model = AutoModel(
    model=self.settings.sense_voice_model,
    vad_model=self.settings.vad_model,  # vad 是用于音频分段的
    vad_kwargs={"max_single_segment_time": 30000},
    device=self.device,
    disable_update=True,
)
res = model.generate(
    input=str(input_path), # 1分钟以上长音频
    cache={},
    language="auto",  # "zn", "en", "yue", "ja", "ko", "nospeech"
    use_itn=False,
    batch_size_s=60,
    output_timestamp=True  #修复前 当同时开启vad和输出时间戳时model.py中会报错
)

Expected behavior

Environment

====== Version =======
funasr:1.2.6  
torch:2.2.0+cpu
torchaudio:2.2.0+cpu  
             ............                xnne@xnne-PC 
         .';;;;;.       .,;,.            ------------                                                     
      .,;;;;;;;.       ';;;;;;;.         OS: Deepin 23 x86_64                                             
    .;::::::::'     .,::;;,''''',.       Host: OMEN by HP Laptop 16-b1xxx                                 
   ,'.::::::::    .;;'.          ';      Kernel: 6.6.59-amd64-desktop-hwe                                 
  ;'  'cccccc,   ,' :: '..        .:     Uptime: 13 hours, 13 mins                                        
 ,,    :ccccc.  ;: .c, '' :.       ,;    Packages: 2324 (dpkg)                                            
.l.     cllll' ., .lc  :; .l'       l.   Shell: bash 5.2.21                                               
.c       :lllc  ;cl:  .l' .ll.      :'   Resolution: 1920x1080, 1920x1080                                 
.l        'looc. .   ,o:  'oo'      c,   DE: DDE                                                          
.o.         .:ool::coc'  .ooo'      o.   WM: KWin                                                         
 ::            .....   .;dddo      ;c    Theme: deepin-dark [GTK2], Adwaita [GTK3]                        
  l:...            .';lddddo.     ,o     Icons: flow [GTK2], Adwaita [GTK3]                               
   lxxxxxdoolllodxxxxxxxxxc      :l      Terminal: deepin-terminal                                        
    ,dxxxxxxxxxxxxxxxxxxl.     'o,       CPU: 12th Gen Intel i7-12700H (20) @ 4.600GHz                    
      ,dkkkkkkkkkkkkko;.    .;o;         GPU: NVIDIA GeForce RTX 3060 Mobile / Max-Q                      
        .;okkkkkdl;.    .,cl:.           GPU: Intel Alder Lake-P GT2 [Iris Xe Graphics]                   
            .,:cccccccc:,.               Memory: 7531MiB / 15650MiB                                       

MrXnneHang avatar Apr 03 '25 02:04 MrXnneHang