FireRedASR icon indicating copy to clipboard operation
FireRedASR copied to clipboard

FireRedASR-LLM-L CUDA out of memory

Open Youjin1985 opened this issue 10 months ago • 6 comments

According to nvidia-smi, I have 24 Gb free on four RTX 4090 Still when I run

speech2text.py --wav_path examples/wav/BAC009S0764W0121.wav --asr_type "llm" --model_dir pretrained_models/FireRedASR-LLM-L

I get

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 260.00 MiB. GPU 0 has a total capacity of 23.64 GiB of which 88.69 MiB is free. Including non-PyTorch memory, this process has 23.54 GiB memory in use. Of the allocated memory 23.15 GiB is allocated by PyTorch, and 14.12 MiB is reserved by PyTorch but unallocated.

Also, code seems to ignore CUDA_VISIBLE_DEVICES environment variable

Image

Youjin1985 avatar Feb 19 '25 15:02 Youjin1985

我发现了模型大概占用32Gb, 请问如何用两个 24-Gb GPU?

Youjin1985 avatar Feb 20 '25 06:02 Youjin1985

可能得研究下用deepspeed之类的做推理。或者试试用cpu先跑起来。或者用float16。

FireRedTeam avatar Feb 20 '25 13:02 FireRedTeam

我用model.half()以后在transcribe中出错:

RuntimeError: Input type (float) and bias type (c10::Half) should be the same

请问如何变成float16?

Youjin1985 avatar Feb 21 '25 03:02 Youjin1985

可能得研究下用deepspeed之类的做推理。或者试试用cpu先跑起来。或者用float16。

LLM用float16吗?技术报告里写到LLM用了lora微调,这部分的参数在encoder中吗? @FireRedTeam

rookie0607 avatar Feb 21 '25 05:02 rookie0607

可能得研究下用deepspeed之类的做推理。或者试试用cpu先跑起来。或者用float16。

如果我要用vllm、deepspeed之类的做推理,该如何做?能提供个脚本或方案吗?

Youjin1985 avatar Feb 21 '25 07:02 Youjin1985

可能得研究下用deepspeed之类的做推理。或者试试用cpu先跑起来。或者用float16。

LLM用float16吗?技术报告里写到LLM用了lora微调,这部分的参数在encoder中吗? @FireRedTeam

FireRedASR-LLM-L的权重都在hf库里了

FireRedTeam avatar Feb 21 '25 14:02 FireRedTeam