ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

Evaluation on if MiniCPM-2B-sft-bf16 need model based optimization on ipex-llm

Open wluo1007 opened this issue 1 year ago • 2 comments

Below are the benchmark results on both THUDM/chatglm3-6b and openbmb/MiniCPM-2B-sft-bf16, from which we can see that chatglm3-6b has better throughput than miniCPM-2b. Considering MiniCPM-2b is a 2B model while chatglm3-6b is a 6B model, I'm not sure if the results are considered normal or further optimization should be done on miniCPM-2b.

Platform: Core ultra 7 165H, 32GB*2=64GB DDR5 5600 MT/s, ubuntu 22.04, Ipex-llm 2.1.0b20240526

Model Data type Batch Input Output First token latency (ms/token) Rest token latency(ms/token)
THUDM/chatglm3-6b INT4-SYM 1 32 32 468.38 46.98
  INT4-SYM 1 1024 128 3997.21 48.87
  INT4-SYM 1 1024 1024 3987.13 49.31
  INT4-SYM 1 2048 1024 8607.79 50.25
openbmb/MiniCPM-2B-sft-bf16 INT4-SYM 1 32 32 258.62 44.72
  INT4-SYM 1 1024 128 2602 65.81
  INT4-SYM 1 1024 1024 2720.03 80.4
  INT4-SYM 1 2048 1024 6910.26 112.05

wluo1007 avatar May 29 '24 03:05 wluo1007