神奇音频,有这句rtf就会变慢,单独识别这句它的rtf也比其他单句慢10倍
识别的文本结果正确吗?
@FireRedTeam 识别结果正确,就是速度慢: 单句识别:别的单句rtf0.03,这句的rtf0.3 batch识别:不包括这句,batch rtf0.03;包括了这句话,batch rtf0.3,速度直接慢了10倍。
这句话多长呢?采样率采样位数对着吗?
输入语音多长,输出文本多长
soxi 053chunk_0222153_0233010_speaker-00.wav
Input File : '053chunk_0222153_0233010_speaker-00.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:10.93 = 174832 samples ~ 819.525 CDDA sectors
File Size : 350k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
https://github.com/user-attachments/assets/fcc9cf2c-32e3-4f5a-aca7-2e37e6a891d5
我用 https://k2-fsa.github.io/sherpa/onnx/FireRedAsr/pretrained.html#sherpa-onnx-fire-red-asr-large-zh-en-2025-02-16-chinese-english 测试了下。这条音频和 sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/test_wavs/0.wav 相比, cpu RTF 差不多,没什么大的变化。
/star-fj/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 ./build/bin/sherpa-onnx-offline --fire-red-asr-encoder=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx --fire-red-asr-decoder=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx --tokens=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt --num-threads=1 ./053chunk_0222153_0233010_speaker-00.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx", decoder="./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx"), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), telespeech_ctc="", tokens="./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
Started
Done!
./053chunk_0222153_0233010_speaker-00.wav
{"lang": "", "emotion": "", "event": "", "text": "进去以后只见一位美人浓妆艳抹坐在床上见两人进来躬身相迎", "timestamps": [], "tokens":["进", "去", "以", "后", "只", "见", "一", "位", "美", "人", "浓", "妆", "艳", "抹", "坐", "在", "床", "上", "见", "两", "人", "进", "来", "躬", "身", "相", "迎"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 12.369 s
Real time factor (RTF): 12.369 / 10.927 = 1.132
/star-fj/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:375 ./build/bin/sherpa-onnx-offline --fire-red-asr-encoder=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx --fire-red-asr-decoder=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx --tokens=./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt --num-threads=1 ./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/encoder.int8.onnx", decoder="./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/decoder.int8.onnx"), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), telespeech_ctc="", tokens="./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="")
Creating recognizer ...
Started
Done!
./sherpa-onnx-fire-red-asr-large-zh_en-2025-02-16/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": "昨天是 MONDAY TODAY IS礼拜二 THE DAY AFTER TOMORROW是星期三", "timestamps": [], "tokens":["昨", "天", "是", " MO", "ND", "AY", " TO", "D", "AY", " IS", "礼", "拜", "二", " THE", " DAY", " AFTER", " TO", "M", "OR", "ROW", "是", "星", "期", "三"], "words": []}
----
num threads: 1
decoding method: greedy_search
Elapsed seconds: 11.134 s
Real time factor (RTF): 11.134 / 10.053 = 1.108
@FireRedTeam 大佬,是否有计划开源小点的模型呀?这个 large 在 cpu 上跑太慢了.
这个消息,是我在微信群看到的。元宵节都过了,希望能开源 xs 配置的模型.
@csukuangfj 能不能拉我进微信群,有些问题想讨论下。微信号: leo_liu_
@csukuangfj 能不能拉我进微信群,有些问题想讨论下。微信号: leo_liu_
文档里有进微信群的方式。请看 https://k2-fsa.github.io/sherpa/social-groups.html
我是直接python代码,大概这样子调用的
`from fireredasr.models.fireredasr import FireRedAsr import json import os import pandas as pd from datetime import datetime
input_dir = "/root/storage/liuhuang.lh/workspace/nlp_llm/FireRedASR/0219_audiobbok" output_jsonfile = "/root/storage/liuhuang.lh/workspace/nlp_llm/FireRedASR/0219_audiobbok/audiobook_60.csv"
wav_lists = os.listdir(input_dir) batch_uttid = [] batch_wav_path = [] for item in wav_lists: if item.endswith(".wav"): batch_uttid.append(item.strip(".wav")) batch_wav_path.append(os.path.join(input_dir, item))
model = FireRedAsr.from_pretrained("aed", "pretrained_models/FireRedASR-AED-L") print(f"load model success.")
print(f"{datetime.now()}: start transcribe...") results = model.transcribe( batch_uttid[0:60], batch_wav_path[0:60], { "use_gpu": 1, "beam_size": 3, "nbest": 1, "decode_max_len": 0, "softmax_smoothing": 1.25, "aed_length_penalty": 0.6, "eos_penalty": 1.0 } )
print(results) `