[Bad Case]: sglang运行MiniCPM3-4B性能不太理想

Open wyy007 opened this issue 1 year ago • 0 comments

Description / 描述

测试环境说明：系统：Ubuntu 22.04.4 LTS 5.15.0-122-generic 驱动：rocm6.1.2 CPU：16核 Intel(R) Xeon(R) w5-3435X 内存：256G 4800 MT/s AMD 7900XTX FP16：123TFLOPS显存：24G显存带宽：964G（满血）

Case Explaination / 案例解释

模型分别使用MiniCPM3-4B、MiniCPM3-4B-GPTQ-Int4、Qwen2.5-7B-Instruct-GPTQ-Int8，模型启动命令如下： MiniCPM3-4B：HIP_VISIBLE_DEVICES=1 python3 -m sglang.launch_server --model-path /root/.cache/modelscope/MiniCPM3-4B --port 30000 --mem-fraction-static 0.8 --kv-cache-dtype auto --attention-backend triton --sampling-backend pytorch --trust-remote-code --schedule-conservativeness 0.1 --disable-cuda-graph --enable-torch-compile

MiniCPM3-4B-GPTQ-Int4：HIP_VISIBLE_DEVICES=1 python3 -m sglang.launch_server --model-path /root/.cache/modelscope/MiniCPM3-4B-GPTQ-Int4 --port 30000 --mem-fraction-static 0.8 --kv-cache-dtype auto --attention-backend triton --sampling-backend pytorch --trust-remote-code --schedule-conservativeness 0.1 --disable-cuda-graph --enable-torch-compile --quantization gptq --disable-mla

Qwen2.5-7B-Instruct-GPTQ-Int8：HIP_VISIBLE_DEVICES=1 python3 -m sglang.launch_server --model-path /root/.cache/modelscope/Qwen2.5-7B-Instruct-GPTQ-Int8 --port 30000 --mem-fraction-static 0.8 --kv-cache-dtype auto --attention-backend triton --sampling-backend pytorch --trust-remote-code --schedule-conservativeness 0.1 --disable-cuda-graph --enable-torch-compile --quantization gptq --disable-mla

从性能结果上看，MiniCPM3-4B要差于Qwen2.5-7B-Instruct-GPTQ-Int8，能帮忙看看是哪里的问题吗？

20241127-165726 20241127-165815 20241127-165834

Nov 27 '24 09:11 wyy007