MiniCPM-V int4和bffloat16推理时间问题（着急）

用如下代码分别测试MiniCPM-2B-dpo-bf16和MiniCPM-dpo-Int4两个模型，推理时间MiniCPM-2B-dpo-bf16有3秒多，MiniCPM-dpo-Int4有10秒以上，请问原因是啥？

May 22 '24 08:05 githublsk

When use int4 model, you should remove the torch_dtype=torch.float16 from AutoModelForCausalLM.from_pretrained(). And it's better to run fast with vllm, which support to run MiniCPM already.

May 23 '24 03:05 iceflame89

When use int4 model, you should remove the torch_dtype=torch.float16 from AutoModelForCausalLM.from_pretrained(). And it's better to run fast with vllm, which support to run MiniCPM already.

多谢回答，模型加载的时候，我做了如下修改： path = '/home/sft_int4' tokenizer = AutoTokenizer.from_pretrained(path,trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(path,device_map="cuda", trust_remote_code=True).eval() start_time=time.time() responds, history = model.chat(tokenizer,input_text,temperature=0.3, top_p=0.8, repetition_penalty=1.05) print(responds) print(time.time()-start_time) 推理的时间还是有10秒多，比bfloat16还是要长不少，不知道什么原因，能否帮忙解释一下@iceflame89

May 23 '24 04:05 githublsk

When use int4 model, you should remove the torch_dtype=torch.float16 from AutoModelForCausalLM.from_pretrained(). And it's better to run fast with vllm, which support to run MiniCPM already.

多谢回答，模型加载的时候，我做了如下修改： path = '/home/sft_int4' tokenizer = AutoTokenizer.from_pretrained(path,trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(path,device_map="cuda", trust_remote_code=True).eval() start_time=time.time() responds, history = model.chat(tokenizer,input_text,temperature=0.3, top_p=0.8, repetition_penalty=1.05) print(responds) print(time.time()-start_time) 推理的时间还是有10秒多，比bfloat16还是要长不少，不知道什么原因，能否帮忙解释一下@iceflame89

您好，我使用类似的prompt格式对两种模型也进行了一次推理，在输出内容基本类似的情况下，int4模型用时5.5s左右，bf16模型用时2~3s左右。这种现象是合理的，因为使用int4进行推理，只有参数压缩。在实际计算的过程中，目前还是会还原回浮点数进行实际计算，暂时还不支持量化计算。针对这个问题，大模型的现状应该都基本一致。

针对这方面的情况，请到我们的语言模型仓库 https://github.com/OpenBMB/MiniCPM 询问，我们对语言模型的优化主要位于那里。如果有更多关于minicpm-v的问题，欢迎再询问。

May 24 '24 07:05 lihytotoro

MiniCPM-V MiniCPM-V copied to clipboard

int4和bffloat16推理时间问题（着急）

MiniCPM-V
MiniCPM-V copied to clipboard