FunASR
FunASR copied to clipboard
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
经过实验发现mossformer2的效果比mossformer要好不少,但目前似乎repo中还没有关于mossformer2的相关code,请问是否有计划release mossformer2以及其时间节点?
OS: linux(CentOS Linux release 7.8.2003 (Core)) Python/C++ Version:Python-3.8.18 Package Version:pytorch-wpe(0.0.1)、torchaudio(2.1.0)、modelscope(1.9.2)、funasr(0.8.0) Model: speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx punc_ct-transformer_zh-cn-common-vocab272727-onnx speech_fsmn_vad_zh-cn-16k-common-onnx Command: python3 funasr_wss_client.py --host 127.0.0.1 --port 10095 --mode offline --audio_in /data/asr/temp/replay.1693548503.57554748.wav --send_without_sleep --output_dir /data/asr/funasr-runtime-resources/samples/python Details: The...
OS: linux Python 3.10 Package: pytorch 1.13.1 、modelscope1.3.0 funasr 0.3.0 Model: auto-speech-recognition Command: inference_pipeline = pipeline( task=Tasks.auto_speech_recognition, model=params["model_dir"], model_revision="v1.2.1", vad_model='damo/speech_fsmn_vad_zh-cn-16k-common-pytorch', vad_model_revision="v1.1.8", punc_model='damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch', punc_model_revision="v1.1.6", batch_size=64, ) Error log: 我是本地环境下进行模型加载,本机能连接公网情况下,是能正常启动模型的,显示如下  但是在不能连接公网的服务器下,却启动不了,问题如下...
请问下现在支持把pretrained模型在不同词典大小下进行微调吗?比如把原本词典大小8404的pretrained模型在词典大小200左右的数据集下进行微调
https://github.com/alibaba-damo-academy/FunASR/blob/4854d398708594a13e3043daf1a19adfde970ea2/funasr/modules/lora/layers.py#L216 参考loralib的issue: https://github.com/microsoft/LoRA/issues/34 推理的时候没有用到eval(),所以权重没有合并进去 可以用他们的分支代码:https://github.com/microsoft/LoRA/tree/bugfix_MergedLinear 顺带着再贴个苏神的博客:https://kexue.fm/archives/9590
想添加noise token训练模型,在config.yaml里添加了添加了  并设置ignore_init_mismatch: true 添加noise token之后训练,请问text文件应该是什么样的? 
I used the code provided in the tutorial but only generated the first two Chinese characters of the audio why?
OS: Linux Python/C++ Version:python 3.9.17 Package Version: pytorch: 2.0.1 torchaudio: 2.0.2 modelscope: 1.9.0 funasr version(pip list): 0.7.6 Model:speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch Command: python paraformer_infer.py --wav_dir xxxx --output_dir xxxx (就是单纯地调用了inference) Details: [code]: inference_pipeline =...
```python python funasr_wss_client.py Namespace(audio_in=None, chunk_interval=10, chunk_size=[5, 10, 5], host='localhost', mode='2pass', output_dir=None, port=10095, send_without_sleep=True, ssl=1, thread_num=1, words_max_print=10000) Namespace(audio_in=None, chunk_interval=10, chunk_size=[5, 10, 5], host='localhost', mode='2pass', output_dir=None, port=10095, send_without_sleep=True, ssl=1, thread_num=1, words_max_print=10000) connect...
TN 错误
中文 输入 `拨打 0551-4376729 购买彩票.` 当前输出 `拨打 零五五一年四十三月七六七二九 购买彩票.` 目标输出 `拨打 零五五幺 四三七六七二九 购买彩票.`