mini-sglang
mini-sglang copied to clipboard
[Education] Offline benchmark performance of Qwen3-0.6B on MLX (CPU) and Modal (GPU)
Offline benchmark performance of Qwen3-0.6B on MLX (CPU) and Modal (GPU)
This PR compares the performance of the Qwen3-0.6B model running on two different platforms: GPU acceleration via Modal (A10G) and CPU inference using MLX (Apple Silicon M1).
Modal
- GPU: A10G
- Model: Qwen/Qwen3-0.6B
- Total: 133966tok
- Time: 44.10s
- Throughput: 3037.56tok/s
MLX Perf
- CPU: Apple Silicon M1
- Model: mlx-community/Qwen3-0.6B-4bit
- Total: 140435tok
- Time: 1017.77s
- Throughput: 137.98tok/s