mlx
mlx copied to clipboard
[BUG] Profiling out of time
Describe the bug A clear and concise description of what the bug is.
When profliing gptoss models add_profiling_suppport, the process of profiling prefill becomes extremely slow, and finally throw timed out error.
I observed only 10 ~ 20 percent gpu utilization during the profiling process.
To Reproduce
Include code snippet
# server side : follow https://github.com/ml-explore/mlx-lm/pull/601
MLX_MAX_CAPTURED_STEPS=3 MTL_CAPTURE_ENABLED=1 python -m mlx_lm.server --model "./gpt-oss-120b-MXFP4-Q4" --port 5001 --log-level DEBUG
# client side
# use curl to send a single request
Expected behavior A clear and concise description of what you expected to happen. Generate trace profile quickly.
Desktop (please complete the following information):
- OS Version: [e.g. MacOS 14.1.2]
- Version [e.g. 0.7.0]
Additional context Add any other context about the problem here.
@pcuenca Could you have a look at it ?