[Education] Offline benchmark performance of Qwen3-0.6B on MLX (CPU) and Modal (GPU)

Open lamng3 opened this issue 4 days ago • 0 comments

Offline benchmark performance of Qwen3-0.6B on MLX (CPU) and Modal (GPU)

This PR compares the performance of the Qwen3-0.6B model running on two different platforms: GPU acceleration via Modal (A10G) and CPU inference using MLX (Apple Silicon M1).

Modal

GPU: A10G
Model: Qwen/Qwen3-0.6B
Total: 133966tok
Time: 44.10s
Throughput: 3037.56tok/s

MLX Perf

CPU: Apple Silicon M1
Model: mlx-community/Qwen3-0.6B-4bit
Total: 140435tok
Time: 1017.77s
Throughput: 137.98tok/s

Dec 23 '25 09:12 lamng3