FireRedASR

FireRedASR copied to clipboard

Published 3 months ago •

Reame
Issues

FireRedASR-LLM-L CUDA out of memory

Open Youjin1985 opened this issue 10 months ago • 6 comments

According to nvidia-smi, I have 24 Gb free on four RTX 4090 Still when I run

speech2text.py --wav_path examples/wav/BAC009S0764W0121.wav --asr_type "llm" --model_dir pretrained_models/FireRedASR-LLM-L

I get

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 260.00 MiB. GPU 0 has a total capacity of 23.64 GiB of which 88.69 MiB is free. Including non-PyTorch memory, this process has 23.54 GiB memory in use. Of the allocated memory 23.15 GiB is allocated by PyTorch, and 14.12 MiB is reserved by PyTorch but unallocated.

Also, code seems to ignore CUDA_VISIBLE_DEVICES environment variable

Feb 19 '25 15:02 Youjin1985

我发现了模型大概占用32Gb，请问如何用两个 24-Gb GPU?

Feb 20 '25 06:02 Youjin1985

可能得研究下用deepspeed之类的做推理。或者试试用cpu先跑起来。或者用float16。

Feb 20 '25 13:02 FireRedTeam

我用model.half()以后在transcribe中出错：

RuntimeError: Input type (float) and bias type (c10::Half) should be the same

请问如何变成float16？

Feb 21 '25 03:02 Youjin1985

可能得研究下用deepspeed之类的做推理。或者试试用cpu先跑起来。或者用float16。

LLM用float16吗？技术报告里写到LLM用了lora微调，这部分的参数在encoder中吗？ @FireRedTeam

Feb 21 '25 05:02 rookie0607

可能得研究下用deepspeed之类的做推理。或者试试用cpu先跑起来。或者用float16。

如果我要用vllm、deepspeed之类的做推理，该如何做？能提供个脚本或方案吗？

Feb 21 '25 07:02 Youjin1985

可能得研究下用deepspeed之类的做推理。或者试试用cpu先跑起来。或者用float16。

LLM用float16吗？技术报告里写到LLM用了lora微调，这部分的参数在encoder中吗？ @FireRedTeam

FireRedASR-LLM-L的权重都在hf库里了

Feb 21 '25 14:02 FireRedTeam