[BUG] Profiling out of time

Open yiakwy-xpu-ml-framework-team opened this issue 2 months ago • 1 comments

Describe the bug A clear and concise description of what the bug is.

When profliing gptoss models add_profiling_suppport, the process of profiling prefill becomes extremely slow, and finally throw timed out error.

I observed only 10 ~ 20 percent gpu utilization during the profiling process.

To Reproduce

Include code snippet

# server side : follow  https://github.com/ml-explore/mlx-lm/pull/601
MLX_MAX_CAPTURED_STEPS=3 MTL_CAPTURE_ENABLED=1 python -m mlx_lm.server --model "./gpt-oss-120b-MXFP4-Q4" --port 5001 --log-level DEBUG

# client side
# use curl to send a single request

Expected behavior A clear and concise description of what you expected to happen. Generate trace profile quickly.

Desktop (please complete the following information):

OS Version: [e.g. MacOS 14.1.2]
Version [e.g. 0.7.0]

Additional context Add any other context about the problem here.

Nov 11 '25 06:11 yiakwy-xpu-ml-framework-team

@pcuenca Could you have a look at it ?

Nov 11 '25 06:11 yiakwy-xpu-ml-framework-team