Evaluation on if MiniCPM-2B-sft-bf16 need model based optimization on ipex-llm

Open wluo1007 opened this issue 1 year ago • 2 comments

Below are the benchmark results on both THUDM/chatglm3-6b and openbmb/MiniCPM-2B-sft-bf16, from which we can see that chatglm3-6b has better throughput than miniCPM-2b. Considering MiniCPM-2b is a 2B model while chatglm3-6b is a 6B model, I'm not sure if the results are considered normal or further optimization should be done on miniCPM-2b.

Platform: Core ultra 7 165H, 32GB*2=64GB DDR5 5600 MT/s, ubuntu 22.04, Ipex-llm 2.1.0b20240526

Model	Data type	Batch	Input	Output	First token latency （ms/token）	Rest token latency（ms/token）
THUDM/chatglm3-6b	INT4-SYM	1	32	32	468.38	46.98
	INT4-SYM	1	1024	128	3997.21	48.87
	INT4-SYM	1	1024	1024	3987.13	49.31
	INT4-SYM	1	2048	1024	8607.79	50.25
openbmb/MiniCPM-2B-sft-bf16	INT4-SYM	1	32	32	258.62	44.72
	INT4-SYM	1	1024	128	2602	65.81
	INT4-SYM	1	1024	1024	2720.03	80.4
	INT4-SYM	1	2048	1024	6910.26	112.05

May 29 '24 03:05 wluo1007