SenseVoice icon indicating copy to clipboard operation
SenseVoice copied to clipboard

Using the following code, wav can be of any length, but mp3 can only recognize a very short length. What's going on?

Open jinwater88 opened this issue 4 months ago • 0 comments

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

Code

` from funasr import AutoModel from funasr.utils.postprocess_utils import rich_transcription_postprocess import time model_dir = "./funasr_models/iic/SenseVoiceSmall"

vad_model_dir = "./funasr_models/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch"

s_time = time.time() model = AutoModel( model=model_dir, trust_remote_code=False, remote_code="./model.py",
# vad_model=vad_model_dir, vad_kwargs={"max_single_segment_time": 30000}, device="cuda:0", ) print(model.model_path) load_time = time.time() print(f"模型加载时间: {time.time() - s_time:.2f}秒")

en

input_file = f"{model.model_path}/example/en.mp3"

input_file = f"./data/像我这样的人-毛不易#hxmnf.mp3" res = model.generate( input=input_file, cache={}, language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech" use_itn=True, # batch_size_s=60, merge_vad=True, # merge_length_s=15, ) print(res) text = rich_transcription_postprocess(res[0]["text"]) print(text) print(f"推理时间: {time.time() - load_time:.2f}秒") output: funasr version: 1.2.7. Check update of funasr, and it would cost few times. You may disable it by setdisable_update=True` in AutoModel You are using the latest version of funasr-1.2.7 WARNING:root:trust_remote_code: False ./funasr_models/iic/SenseVoiceSmall 模型加载时间: 4.34秒 rtf_avg: 0.007: 100%|███████████████| 1/1 [00:01<00:00, 1.45s/it] [{'key': '像我这样的人-毛不易#hxmnf', 'text': '<|zh|><|SAD|><|BGM|><|withitn|>优这样迷茫多少人像我这样孤单的人迷茫的人这样碌碌无为的人过多少人像我这样孤单的人这样不甘平凡的人世界上有多少人这样莫名其妙。'}] 🎼优这样迷茫多少人像我这样孤单的人迷茫的人这样碌碌无为的人过多少人像我这样孤单的人这样不甘平凡的人世界上有多少人这样莫名其妙。😔 推理时间: 1.46秒

What have you tried?

What's your environment?

  • OS (e.g., Linux):ubuntu22.04
  • FunASR Version (e.g., 1.0.0):1.2.7 funasr-onnx:0.4.1
  • ModelScope Version (e.g., 1.11.0):1.29.1
  • PyTorch Version (e.g., 2.0.0):2.6.0
  • How you installed funasr (pip, source):pip
  • Python version:3.10
  • GPU (e.g., V100M32):RTX4070
  • CUDA/cuDNN version (e.g., cuda11.7):cuda12.4
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
  • Any other relevant information: Using the following code, wav can be of any length, but mp3 can only recognize a very short length. What's going on? I converted mp3 to wav and then used the above code to output the same result, and the entire text could not be output.

jinwater88 avatar Aug 26 '25 05:08 jinwater88