FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Results 555 FunASR issues
Sort by recently updated
recently updated
newest added
trafficstars

funasr支持speech_campplus_speaker-diarization_common吗

question

## ❓ 在SenseVoice使用时候怎么设置需要标点,但不需要逆文本正则化 ## Code ``` from funasr import AutoModel from funasr.utils.postprocess_utils import rich_transcription_postprocess model_dir = "iic/SenseVoiceSmall" model = AutoModel( model=model_dir, vad_model="fsmn-vad", vad_kwargs={"max_single_segment_time": 30000}, device="cuda:0", ) res = model.generate( input=f"{model.model_path}/example/en.mp3",...

question

#### What is your question? Paraformer语音识别-中文-通用-16k-离线-large-长音频版(https://modelscope.cn/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch),使用20小时语料进行微调,在微调服务器上完成量化导出和测试,测试效果很好,但是将量化后的权重文件,替换docker中对应的量化模型下的权重文件重启后,输出效果不如测试效果,请问是不是需要将依赖的vad、punc、lm模型也是用相同语料微调 #### Code #### What have you tried? 将模型字典配置文件全部同步到docker对应模型下,替换了相同文件,同时将docker中的长音频版量化模型,导入到服务器上使用微调量化后的权重文件替换,效果很好 #### What's your environment? - OS (e.g., Linux): - FunASR Version (e.g., 1.0.0): 1.0.12 - ModelScope...

question

from funasr import AutoModel import time wav_file = "/mnt/data/toolbox_dir/voice_trans/test-file/vad_example.wav" model = AutoModel( model="/mnt/data/toolbox_dir/voice_trans/Whisper-large-v3", vad_model="/mnt/data/toolbox_dir/voice_trans/speech_fsmn_vad_zh-cn-16k-common-pytorch", vad_kwargs={"max_single_segment_time": 30000}, punc_model="/mnt/data/toolbox_dir/voice_trans/punc_ct-transformer_cn-en-common-vocab471067-large", spk_model="/mnt/data/toolbox_dir/voice_trans/speech_campplus_sv_zh-cn_16k-common", device='cuda:2' ) start_time = time.time() res = model.generate( input=wav_file, batch_size_s=300, batch_size=1 )...

question

1.2.4按官方示例在cuda上运行加上output_timestamp会报如下错误(开启vad),如果跑在cpu上则不会,但是跑在cpu上有的长音频(大于30分钟)会出现timestamp长度和文本长度不一致的问题(在开启use_itn的时候),或者funasr/models/sense_voice/utils/ctc_alignment.py中_t_a_r_g_e_t_s_.size(-1) == 0的情况导致报错(在不开启use_itn的时候),望解决 ## 🐛 Bug /usr/local/lib/python3.10/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg,...

bug

## ❓ Questions and Help #### What is your question? 在LLM-ASR任务中,我用默认whisper_qwen_linear.yaml训练aishell,训练了10个epoch,用best_model.pt进行inference。 第一次:默认whisper_qwen_linear.yaml中有SpecAugLFR,因此在inference的时候经常出现无厘头的重复,频率很高。 e.g. BAC009S0768W0178 撇油加加撇油加加撇油加加撇油加加撇油加加撇油。。。 第二次:删掉默认whisper_qwen_linear.yaml中所有的dropout和SpecAugLFR,重新训练以后,在inference无厘头的重复出现的概率降低了,但依然偶尔会有。问题转移成在inference是会在解码结果前面多出现一两个字。我已经检查了mask似乎没有什么问题,在推理时我也尝试禁止了prompt,结果似乎也没有变化。 e.g. BAC009S0766W0399 幢经过近两个星期的漫长等待 (经过近两个星期的漫长等待) 一些配置和sh文件我通过附件的形式发你: conf:[https://github.com/NiniAndy/FunASR/blob/mymerge/examples/industrial_data_pretraining/llm_asr/conf/whisper_qwen_linear.yaml](url) train.sh: [https://github.com/NiniAndy/FunASR/blob/mymerge/examples/industrial_data_pretraining/llm_asr/demo_train_or_finetune.sh](url) inference.sh: [https://github.com/NiniAndy/FunASR/blob/mymerge/examples/industrial_data_pretraining/llm_asr/infer_speech2text.sh](url) #### What's your environment?...

question

按照官网上的finetune.sh文件进行微调,为什么在log.txt文件里没有出现loss值?

question

现在只有uniasr的闽南语模型效果最好,用户基数也有,为什么不打算支持这个模型的端侧部署了呢

I am currently facing an issue with using multiple GPUs simultaneously when running inference on vLLM with Xinference. The setup works correctly when using a single GPU with smaller models,...

question

from funasr import AutoModel from funasr.utils.postprocess_utils import rich_transcription_postprocess model_dir = "iic/SenseVoiceSmall" model = AutoModel( model=model_dir, vad_model="fsmn-vad", vad_kwargs={"max_single_segment_time": 30000}, device="cuda:0", ) # en res = model.generate( input=f"D:\\demo\demo_1\\recording.wav", cache={}, language="auto", # "zn",...

question