MiniCPM icon indicating copy to clipboard operation
MiniCPM copied to clipboard

[Bad Case]: sglang运行MiniCPM3-4B性能不太理想

Open wyy007 opened this issue 1 year ago • 0 comments

Description / 描述

测试环境说明:系统:Ubuntu 22.04.4 LTS 5.15.0-122-generic 驱动:rocm6.1.2 CPU:16核 Intel(R) Xeon(R) w5-3435X 内存:256G 4800 MT/s AMD 7900XTX FP16:123TFLOPS显存:24G显存带宽:964G(满血)

Case Explaination / 案例解释

模型分别使用MiniCPM3-4B、MiniCPM3-4B-GPTQ-Int4、Qwen2.5-7B-Instruct-GPTQ-Int8,模型启动命令如下: MiniCPM3-4B:HIP_VISIBLE_DEVICES=1 python3 -m sglang.launch_server --model-path /root/.cache/modelscope/MiniCPM3-4B --port 30000 --mem-fraction-static 0.8 --kv-cache-dtype auto --attention-backend triton --sampling-backend pytorch --trust-remote-code --schedule-conservativeness 0.1 --disable-cuda-graph --enable-torch-compile

MiniCPM3-4B-GPTQ-Int4:HIP_VISIBLE_DEVICES=1 python3 -m sglang.launch_server --model-path /root/.cache/modelscope/MiniCPM3-4B-GPTQ-Int4 --port 30000 --mem-fraction-static 0.8 --kv-cache-dtype auto --attention-backend triton --sampling-backend pytorch --trust-remote-code --schedule-conservativeness 0.1 --disable-cuda-graph --enable-torch-compile --quantization gptq --disable-mla

Qwen2.5-7B-Instruct-GPTQ-Int8:HIP_VISIBLE_DEVICES=1 python3 -m sglang.launch_server --model-path /root/.cache/modelscope/Qwen2.5-7B-Instruct-GPTQ-Int8 --port 30000 --mem-fraction-static 0.8 --kv-cache-dtype auto --attention-backend triton --sampling-backend pytorch --trust-remote-code --schedule-conservativeness 0.1 --disable-cuda-graph --enable-torch-compile --quantization gptq --disable-mla

从性能结果上看,MiniCPM3-4B要差于Qwen2.5-7B-Instruct-GPTQ-Int8,能帮忙看看是哪里的问题吗?

20241127-165726 20241127-165815 20241127-165834

wyy007 avatar Nov 27 '24 09:11 wyy007