qiyue comments

Results 1 comments of


                                            qiyue

当开启fp16时 LLM模型推理的结果异常

目前FireRedASR-LLM-L的模型不是标准的Huggingface transformers结构，其自定义模型加载过程的源码在[fireredasr_llm.py](https://github.com/FireRedTeam/FireRedASR/blob/main/fireredasr/models/fireredasr_llm.py)中，它的自定义模型加载实现是通过args.use_flash_attn、args.use_fp16控制，当前这二者的参数都为0，默认使用torch.float32，在GPU测试环境RTX 3090上无法完成推理，提示CUDA OutOfMemory；考虑到FireRedASR-LLM-L模型为tar格式无法修改，所以直接修改FireRedASR的源码[fireredasr.py](https://github.com/FireRedTeam/FireRedASR/blob/main/fireredasr/models/fireredasr.py)，设置use_fp16=1，让其使用torch.float16来进行推理；不过float16推理的情况下，返回的text均为%，在github上发现已存在这个issue[开启fp16推理结果异常](https://github.com/FireRedTeam/FireRedASR/issues/25) 修改FireRedASR源码[fireredasr_llm.py](https://github.com/FireRedTeam/FireRedASR/blob/main/fireredasr/models/fireredasr_llm.py)，固定推理inference_dtype=torch.bfloat16，成功完成推理流程 ` inference_dtype = torch.bfloat16 # Build LLM llm = AutoModelForCausalLM.from_pretrained( args.llm_dir, attn_implementation=attn_implementation, torch_dtype=inference_dtype, )`