FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Results 484 FunASR issues
Sort by recently updated
recently updated
newest added

1. 运行环境: 操作系统:linux python:3.8 torch:2.0.0 modelscope:1.9.3 gpu:p100, 显卡驱动535, cuda:11.7 2.执行代码 ``` from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks from modelscope.utils.logger import get_logger import tracemalloc import logging tracemalloc.start() logger =...

运行环境: 操作系统:linux python:3.8.16 modelscope:1.9.4 funasr: 0.8.4 gpu:T4 cuda:11.6 代码 ``` from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks from modelscope.utils.logger import get_logger import time import logging logger = get_logger(log_level=logging.CRITICAL) logger.setLevel(logging.CRITICAL)...

When I try to convert some audio files, I notice that the timestamps in the returned result don't look correct. For example, the total duration of the audio file is...

您好, 微调时的数据格式是text和wav.scp,我这里有的数据还有segments文件,即为音频其中的某一段,全部切成小文件有点浪费空间。所以数据格式可以支持segments吗?谢谢!

你好,我使用官方提供的脚本(finetune.py): import os from modelscope.metainfo import Trainers from modelscope.trainers import build_trainer from funasr.datasets.ms_dataset import MsDataset from funasr.utils.modelscope_param import modelscope_args def modelscope_finetune(params): if not os.path.exists(params.output_dir): os.makedirs(params.output_dir, exist_ok=True) # dataset split ["train",...

bug

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --nproc_per_node 2 --master_port=29501 finetune.py 多卡finetune damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online 模型时报错: Task related config: error: unrecognized arguments: --local-rank=0 usage: Task related config [-h] [--config CONFIG] [--frontend {default,sliding_window,s3prl,fused,wav_frontend,multichannelfrontend}] [--frontend_conf FRONTEND_CONF] [--specaug...

你好,请教一个问题: 比如我在推理的时候设置batch_size=8,指的是在处理一条语音时,同时解码的token数,还是同时进行8条语音推理?

可以使用pipeline的方式实现多卡并发识别吗

https://alibaba-damo-academy.github.io/FunASR/en/runtime/docs/benchmark_libtorch.html 官方的libtorch cpu benchmark在concurrent-tasks为32的情形下,fp32模型rtf可以达到0.0066。我在libtorch 1.7 1.10 1.12上测下来,rtf最多也只有0.03,比官方数据效果差。同时libtorch与onnx相比,在相同的concurrent-tasks下,rtf也高很多,请问是哪里的问题呢?