mini-sglang icon indicating copy to clipboard operation
mini-sglang copied to clipboard

[Education] Offline benchmark performance of Qwen3-0.6B on MLX (CPU) and Modal (GPU)

Open lamng3 opened this issue 4 days ago • 0 comments

Offline benchmark performance of Qwen3-0.6B on MLX (CPU) and Modal (GPU)

This PR compares the performance of the Qwen3-0.6B model running on two different platforms: GPU acceleration via Modal (A10G) and CPU inference using MLX (Apple Silicon M1).

Modal

  • GPU: A10G
  • Model: Qwen/Qwen3-0.6B
  • Total: 133966tok
  • Time: 44.10s
  • Throughput: 3037.56tok/s

MLX Perf

  • CPU: Apple Silicon M1
  • Model: mlx-community/Qwen3-0.6B-4bit
  • Total: 140435tok
  • Time: 1017.77s
  • Throughput: 137.98tok/s

lamng3 avatar Dec 23 '25 09:12 lamng3