FireRedASR icon indicating copy to clipboard operation
FireRedASR copied to clipboard

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recogn...

Results 65 FireRedASR issues
Sort by recently updated
recently updated
newest added

Could you please let me know if the datasets ws_meeting and ws_net will be open-sourced? I can't find them online at the moment and I look forward to using this...

报错: srts = asr_task(wavs, asr_type=model) OutOfMemoryError: CUDA out of memory. Tried to allocate 1.85 GiB. GPU 0 has a total capacity of 11.90 GiB of which 989.88 MiB is free....

已经尝试了一些时间,但还是没有头绪

原始输入[[{'role': 'user', 'content': '请转写音频为文字'}, {'role': 'assistant', 'content': ''}]]转换为token_id后得到 [[151644, 872, 198, 151646, 14880, 46670, 61443, 111268, 17714, 87335, 151645, 198, 151644, 77091, 198]]其中“speech”的token_id为"151646" 而在embedding函数中打印weight形状为:torch.Size([151646, 3584]) 发生报错“index out of range in...

batch!=1使用llm出现了一些重复解码的case是什么问题,怎么解决

只要包含这句,无论是batch还是单独识别,rtf慢10倍,神奇。。。 [error wav rtf slow 10times.zip](https://github.com/user-attachments/files/18983673/error.wav.rtf.slow.10times.zip)

According to nvidia-smi, I have 24 Gb free on four RTX 4090 Still when I run speech2text.py --wav_path examples/wav/BAC009S0764W0121.wav --asr_type "llm" --model_dir pretrained_models/FireRedASR-LLM-L I get torch.OutOfMemoryError: CUDA out of memory....

Traceback (most recent call last): File "/Users/guowenchao/Job/AI/FireRedTeadASR/FireRedASR/examples/fireredasr/speech2text.py", line 105, in main(args) File "/Users/guowenchao/Job/AI/FireRedTeadASR/FireRedASR/examples/fireredasr/speech2text.py", line 54, in main results = model.transcribe( File "/opt/homebrew/Caskroom/miniconda/base/envs/fireredasr/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File...

Namespace(asr_type='llm', model_dir='/root/tts_asr/FireRedASR/examples/pretrained_models/FireRedASR-LLM-L', wav_path='wav/cmd1740120703406.wav', wav_paths=None, wav_dir=None, wav_scp=None, output='out/llm-l-asr.txt', use_gpu=1, batch_size=1, beam_size=3, decode_max_len=0, nbest=1, softmax_smoothing=1.0, aed_length_penalty=0.0, eos_penalty=1.0, decode_min_len=0, repetition_penalty=3.0, llm_length_penalty=1.0, temperature=1.0) #wavs=1 model args: Namespace(input_length_max=30.0, input_length_min=0.1, output_length_max=150, output_length_min=1, freeze_encoder=0, encoder_downsample_rate=2, freeze_llm=0, use_flash_attn=0,...