mini-sglang icon indicating copy to clipboard operation
mini-sglang copied to clipboard

Request-scoped Torch profiler via profile flag.

Open AdamLouly opened this issue 1 week ago • 1 comments

Summary (only 94 lines of code) Adds an opt-in, per-request profiling path: clients can send "profile": true and mini-sglang will start a torch.profiler session for that request, then export a Chrome-trace JSON to tmp once the request finishes.

What changed

  • API: add profile: bool = False to GenerateRequest and OpenAICompletionRequest, and forward it into TokenizeMsg in python/minisgl/server/api_server.py.
  • Messaging: propagate profile through tokenizer/backend messages (TokenizeMsg, UserMsg).
  • Scheduler/core: carry profile into the internal Req object and start/stop profiling around the first in-flight request with profile=True.
  • New utility: python/minisgl/utils/profiler.py implements RequestProfiler (start/stop/export).

How to use / test

# start server (your normal way)
# then:
curl -X POST http://127.0.0.1:1919/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt":"hello","max_tokens":64,"profile":true}'

ls -lh /tmp/minisgl-profile-uid*.json

AdamLouly avatar Dec 18 '25 13:12 AdamLouly

Thanks!

I'm wondering whether we should add a global scoped ProfileReq just like SGLang? I'm uncertain about the pros and cons of request-scoped profiler against a global-scoped one. Personally, I think using a seperate ProfileReq can offer better flexibility (because request-scoped profiler can be implemented in the client side on top of global-scope profiler).

DarkSharpness avatar Dec 20 '25 20:12 DarkSharpness