FunASR
FunASR copied to clipboard
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Location https://github.com/alibaba-damo-academy/FunASR/blame/6e86c5044d30dffe356b6e42838d01b7cfaf4272/README.md#L158C2-L158C3 The original code ```python wav_file = f"{model.model_path}/example/asr_example.wav" ``` I Guess You Meant ```python wav_file = f"{model.model_path}/example/vad_example.wav" ```
``` from funasr import AutoModel File "C:\Users\loong\.conda\envs\nlp\lib\site-packages\funasr\__init__.py", line 33, in from funasr.auto.auto_model import AutoModel File "C:\Users\loong\.conda\envs\nlp\lib\site-packages\funasr\auto\auto_model.py", line 19, in from funasr.utils.load_utils import load_bytes File "C:\Users\loong\.conda\envs\nlp\lib\site-packages\funasr\utils\load_utils.py", line 8, in import torchaudio...
感谢开源优秀的工作,因为目前只支持中文和英文,其他语种的ASR模型打算使用faster-whisper模型,请问该项目是否支持使用faster-whisper模型?
系统:ubuntu22.04 版本信息: funasr==1.0.18,modelscope==1.11.1 推理代码: from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks inference_pipeline = pipeline( task=Tasks.auto_speech_recognition, model='damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online',model_revision='v2.0.4', vad_model='iic/speech_fsmn_vad_zh-cn-16k-common-pytorch', vad_model_revision="v2.0.4", vad_kwargs={"max_single_segment_time": 60000}, punc_model='iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch', punc_model_revision="v2.0.4", ) rec_result = inference_pipeline(input='./0325.wav') print(rec_result[0]) 问题:0325.wav该音频时长4分钟,推理出错,取前10s钟能正常推理 错误信息:...
背景:我在探索paraformer在端侧上部署方法,我希望通过RK框架调用NPU进行推理。RK框架只支持fp16精度的模型进行推理。 FP16的表示范围[-65504 ~ 66504],FP32表示范围[-3.4×10^{38},3.4×10^{38}],因此FP32模型直接转RK模型,在推理过程中会出现溢出(NAN)。 我采用了FUNASR教程:https://github.com/alibaba-damo-academy/FunASR/blob/v0.8.8/funasr/export/README.md ,进行INT8量化,然而该方案是动态量化,在真正计算时仍会逆量化为fp32。 问题: 请问是否有真正的fp16模型或者finetune训练方案?
https://github.com/alibaba-damo-academy/FunASR/issues/1478 https://www.modelscope.cn/models/dengcunqin/speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/summary model_name_or_model_dir="dengcunqin/speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online" model_revision="master" torchrun \ --nnodes 1 \ --nproc_per_node ${gpu_num} \ funasr/bin/train.py \ ++model="${model_name_or_model_dir}" \ ++model_revision="${model_revision}" \ ++train_data_set_list="${train_data}" \ ++valid_data_set_list="${val_data}" \ ++dataset_conf.batch_size=64 \ ++dataset_conf.batch_type="token" \ ++dataset_conf.num_workers=4 \ ++train_conf.max_epoch=50 \...
复现条件,直接运行起来,通过AliFsmnVad实例的GetSegments函数,返回的时间比实际时长多 # #环境 -操作系统(如windows11): FunASR版本最新版 Microsoft.ML.OnnxRuntime 最新版本:   测试音频为speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch\example提供 的asr_example.wav 下载 https://www.modelscope.cn/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/file/view/master/example%2Fasr_example.wav?status=0
在我的音频中老师的提问和学生的回答时间间隔为1秒左右, 我减少max_end_silence_time为500ms尝试精确定位句尾,但是没有效果,无法精确分离老师和学生的话,请问还可以尝试什么配置呢? **vad_model模型配置如下:** ``` frontend: WavFrontendOnline frontend_conf: fs: 16000 window: hamming n_mels: 80 frame_length: 25 frame_shift: 10 dither: 0.0 lfr_m: 5 lfr_n: 1 model: FsmnVADStreaming model_conf: sample_rate: 16000 detect_mode: 1...
用finetune.sh在自有数据集上微调下载的speech_paraformer_asr_nat-zh-cn-8k-common-vocab8358-tensorflow1 模型时,有一些warning,如下: grad.sizes() = [1, 320], strides() = [1, 1] bucket_view.sizes() = [1, 320], strides() = [320, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:325.) Variable._execution_engine.run_backward( # Calls into the C++ engine to...