wwfcnu

Results 104 comments of wwfcnu

> @wwfcnu 我也问过类似的问题,目前funasr还不支持多种格式的文件,唯一的解决办法是通过类似sox或者ffmpeg把他们转换成单通道、16000hz的wav文件。注意转换后的wav文件一定要是16000hz的,否则识别效果会大打折扣。 再源码里加上识别mp3格式,识别的时候是不是会快一些

> 是支持的,需要升级到最新版本,升级指令: pip install "modelscope[audio_asr]" --upgrade -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html git clone https://github.com/alibaba/FunASR.git && cd FunASR pip install --editable ./ > > 包括mp3在内的多种音频格式,采样率都是支持的,用户不需要关心输入音频格式,如果遇到不支持的音频,可以反馈一下,repo:https://github.com/alibaba-damo-academy/FunASR > > 欢迎加入funasr钉钉群沟通您遇到的问题:27215013275 主要是 torchaudio版本的问题,看能不能把源码处理语音的sox换成ffmpeg

> 完整的跑了一下三个小时的模型,按照[这里的代码](https://github.com/alibaba-damo-academy/FunASR/blob/main/egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/demo_online_v2.py)来运行,打印了下执行次数和运行时间,完整代码如下 > > ``` > import os > import logging > import torch > import soundfile > > from modelscope.pipelines import pipeline > from modelscope.utils.constant import Tasks > from...

使用镜像funasr-runtime-sdk-online-cpu-0.1.5

> `2pass-online` for real-time recognition results and `2pass-offline` for 2-pass corrected recognition results https://github.com/alibaba-damo-academy/FunASR/blob/main/runtime/docs/websocket_protocol_zh.md#%E4%BB%8E%E6%9C%8D%E5%8A%A1%E7%AB%AF%E5%BE%80%E5%AE%A2%E6%88%B7%E7%AB%AF%E5%8F%91%E6%95%B0%E6%8D%AE-1 只有这3种

KenLM doesn't seem to have a bin2arpa tool ---- Replied Message ---- | From | Aadarsh ***@***.***> | | Date | 09/28/2023 02:49 | | To | ***@***.***> | |...

> 1. Edit `your/path/to/pyannote/speaker-diarization/config.yaml` > > ```yaml > pipeline: > name: pyannote.audio.pipelines.SpeakerDiarization > params: > clustering: AgglomerativeClustering > embedding: your/path/to/speechbrain/spkrec-ecapa-voxceleb # Folder, must contains `speechbrain` keyword. > embedding_batch_size: 32 >...