FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

1.2.4按官方示例在cuda上运行加上output_timestamp会报错

Open yqw1996 opened this issue 8 months ago • 1 comments
trafficstars

1.2.4按官方示例在cuda上运行加上output_timestamp会报如下错误(开启vad),如果跑在cpu上则不会,但是跑在cpu上有的长音频(大于30分钟)会出现timestamp长度和文本长度不一致的问题(在开启use_itn的时候),或者funasr/models/sense_voice/utils/ctc_alignment.py中_t_a_r_g_e_t_s_.size(-1) == 0的情况导致报错(在不开启use_itn的时候),望解决

🐛 Bug

/usr/local/lib/python3.10/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning) Notice: ffmpeg is not installed. torchaudio is used to load audio If you want to use ffmpeg backend to load audio, please install it by: sudo apt install ffmpeg # ubuntu # brew install ffmpeg # mac funasr version: 1.2.4. Check update of funasr, and it would cost few times. You may disable it by set disable_update=True in AutoModel Loading remote code failed: ./model.py, No module named 'model' rtf_avg: 0.007: 100%|█████████████████████████████| 1/1 [00:00<00:00, 7.52it/s] 0%| | 0/1 [00:00<?, ?it/s] 0%| | 0/2 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [1,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. Traceback (most recent call last): File "/app/yqw_python_mapping/python_project/asr/stream_voice/funasr_demo.py", line 18, in res = model.generate( File "/usr/local/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 306, in generate return self.inference_with_vad(input, input_len=input_len, **cfg) File "/usr/local/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 464, in inference_with_vad results = self.inference( File "/usr/local/lib/python3.10/site-packages/funasr/auto/auto_model.py", line 345, in inference res = model.inference(**batch, **kwargs) File "/usr/local/lib/python3.10/site-packages/funasr/models/sense_voice/model.py", line 932, in inference pred = groupby(align[0, : encoder_out_lens[0]]) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Code sample

from funasr import AutoModel from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "iic/SenseVoiceSmall"

model = AutoModel( model=model_dir, vad_model="fsmn-vad", vad_kwargs={"max_single_segment_time": 30000}, device="cuda:0", )

res = model.generate( input=f"{model.model_path}/example/en.mp3", cache={}, language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech" use_itn=True, batch_size_s=60, merge_vad=True, # merge_length_s=15, output_timestamp=true ) text = rich_transcription_postprocess(res[0]["text"]) print(text)

Environment

  • OS (e.g., Linux):linux
  • FunASR Version (e.g., 1.0.0):1.2.4
  • ModelScope Version (e.g., 1.11.0):1.18.1
  • PyTorch Version (e.g., 2.0.0):2.4.1
  • How you installed funasr (pip, source):pip
  • Python version:3.10.13
  • GPU (e.g., V100M32):A10
  • CUDA/cuDNN version (e.g., cuda11.7):12.4.1
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1):not docker

yqw1996 avatar Mar 06 '25 08:03 yqw1996