FireRedASR
FireRedASR copied to clipboard
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recogn...
Could you please let me know if the datasets ws_meeting and ws_net will be open-sourced? I can't find them online at the moment and I look forward to using this...
如何使用多gpu
报错: srts = asr_task(wavs, asr_type=model) OutOfMemoryError: CUDA out of memory. Tried to allocate 1.85 GiB. GPU 0 has a total capacity of 11.90 GiB of which 989.88 MiB is free....
已经尝试了一些时间,但还是没有头绪
原始输入[[{'role': 'user', 'content': '请转写音频为文字'}, {'role': 'assistant', 'content': ''}]]转换为token_id后得到 [[151644, 872, 198, 151646, 14880, 46670, 61443, 111268, 17714, 87335, 151645, 198, 151644, 77091, 198]]其中“speech”的token_id为"151646" 而在embedding函数中打印weight形状为:torch.Size([151646, 3584]) 发生报错“index out of range in...
batch!=1使用llm出现了一些重复解码的case是什么问题,怎么解决
只要包含这句,无论是batch还是单独识别,rtf慢10倍,神奇。。。 [error wav rtf slow 10times.zip](https://github.com/user-attachments/files/18983673/error.wav.rtf.slow.10times.zip)
According to nvidia-smi, I have 24 Gb free on four RTX 4090 Still when I run speech2text.py --wav_path examples/wav/BAC009S0764W0121.wav --asr_type "llm" --model_dir pretrained_models/FireRedASR-LLM-L I get torch.OutOfMemoryError: CUDA out of memory....
Traceback (most recent call last): File "/Users/guowenchao/Job/AI/FireRedTeadASR/FireRedASR/examples/fireredasr/speech2text.py", line 105, in main(args) File "/Users/guowenchao/Job/AI/FireRedTeadASR/FireRedASR/examples/fireredasr/speech2text.py", line 54, in main results = model.transcribe( File "/opt/homebrew/Caskroom/miniconda/base/envs/fireredasr/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File...
Namespace(asr_type='llm', model_dir='/root/tts_asr/FireRedASR/examples/pretrained_models/FireRedASR-LLM-L', wav_path='wav/cmd1740120703406.wav', wav_paths=None, wav_dir=None, wav_scp=None, output='out/llm-l-asr.txt', use_gpu=1, batch_size=1, beam_size=3, decode_max_len=0, nbest=1, softmax_smoothing=1.0, aed_length_penalty=0.0, eos_penalty=1.0, decode_min_len=0, repetition_penalty=3.0, llm_length_penalty=1.0, temperature=1.0) #wavs=1 model args: Namespace(input_length_max=30.0, input_length_min=0.1, output_length_max=150, output_length_min=1, freeze_encoder=0, encoder_downsample_rate=2, freeze_llm=0, use_flash_attn=0,...
Just like whipser does