ipex-llm
ipex-llm copied to clipboard
Evaluation on if MiniCPM-2B-sft-bf16 need model based optimization on ipex-llm
Below are the benchmark results on both THUDM/chatglm3-6b and openbmb/MiniCPM-2B-sft-bf16, from which we can see that chatglm3-6b has better throughput than miniCPM-2b. Considering MiniCPM-2b is a 2B model while chatglm3-6b is a 6B model, I'm not sure if the results are considered normal or further optimization should be done on miniCPM-2b.
Platform: Core ultra 7 165H, 32GB*2=64GB DDR5 5600 MT/s, ubuntu 22.04, Ipex-llm 2.1.0b20240526
| Model | Data type | Batch | Input | Output | First token latency (ms/token) | Rest token latency(ms/token) |
|---|---|---|---|---|---|---|
| THUDM/chatglm3-6b | INT4-SYM | 1 | 32 | 32 | 468.38 | 46.98 |
| INT4-SYM | 1 | 1024 | 128 | 3997.21 | 48.87 | |
| INT4-SYM | 1 | 1024 | 1024 | 3987.13 | 49.31 | |
| INT4-SYM | 1 | 2048 | 1024 | 8607.79 | 50.25 | |
| openbmb/MiniCPM-2B-sft-bf16 | INT4-SYM | 1 | 32 | 32 | 258.62 | 44.72 |
| INT4-SYM | 1 | 1024 | 128 | 2602 | 65.81 | |
| INT4-SYM | 1 | 1024 | 1024 | 2720.03 | 80.4 | |
| INT4-SYM | 1 | 2048 | 1024 | 6910.26 | 112.05 |