FireRedASR-LLM-L CUDA out of memory
According to nvidia-smi, I have 24 Gb free on four RTX 4090 Still when I run
speech2text.py --wav_path examples/wav/BAC009S0764W0121.wav --asr_type "llm" --model_dir pretrained_models/FireRedASR-LLM-L
I get
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 260.00 MiB. GPU 0 has a total capacity of 23.64 GiB of which 88.69 MiB is free. Including non-PyTorch memory, this process has 23.54 GiB memory in use. Of the allocated memory 23.15 GiB is allocated by PyTorch, and 14.12 MiB is reserved by PyTorch but unallocated.
Also, code seems to ignore CUDA_VISIBLE_DEVICES environment variable
我发现了模型大概占用32Gb, 请问如何用两个 24-Gb GPU?
可能得研究下用deepspeed之类的做推理。或者试试用cpu先跑起来。或者用float16。
我用model.half()以后在transcribe中出错:
RuntimeError: Input type (float) and bias type (c10::Half) should be the same
请问如何变成float16?
可能得研究下用deepspeed之类的做推理。或者试试用cpu先跑起来。或者用float16。
LLM用float16吗?技术报告里写到LLM用了lora微调,这部分的参数在encoder中吗? @FireRedTeam
可能得研究下用deepspeed之类的做推理。或者试试用cpu先跑起来。或者用float16。
如果我要用vllm、deepspeed之类的做推理,该如何做?能提供个脚本或方案吗?
可能得研究下用deepspeed之类的做推理。或者试试用cpu先跑起来。或者用float16。
LLM用float16吗?技术报告里写到LLM用了lora微调,这部分的参数在encoder中吗? @FireRedTeam
FireRedASR-LLM-L的权重都在hf库里了