mlx icon indicating copy to clipboard operation
mlx copied to clipboard

[BUG] Profiling out of time

Open yiakwy-xpu-ml-framework-team opened this issue 2 months ago • 1 comments

Describe the bug A clear and concise description of what the bug is.

When profliing gptoss models add_profiling_suppport, the process of profiling prefill becomes extremely slow, and finally throw timed out error.

I observed only 10 ~ 20 percent gpu utilization during the profiling process.

To Reproduce

Include code snippet

# server side : follow  https://github.com/ml-explore/mlx-lm/pull/601
MLX_MAX_CAPTURED_STEPS=3 MTL_CAPTURE_ENABLED=1 python -m mlx_lm.server --model "./gpt-oss-120b-MXFP4-Q4" --port 5001 --log-level DEBUG

# client side
# use curl to send a single request

Expected behavior A clear and concise description of what you expected to happen. Generate trace profile quickly.

Desktop (please complete the following information):

  • OS Version: [e.g. MacOS 14.1.2]
  • Version [e.g. 0.7.0]

Additional context Add any other context about the problem here.

@pcuenca Could you have a look at it ?