mini-sglang
mini-sglang copied to clipboard
Request-scoped Torch profiler via profile flag.
Summary
(only 94 lines of code)
Adds an opt-in, per-request profiling path: clients can send "profile": true and mini-sglang will start a torch.profiler session for that request, then export a Chrome-trace JSON to tmp once the request finishes.
What changed
- API: add
profile: bool = FalsetoGenerateRequestandOpenAICompletionRequest, and forward it intoTokenizeMsginpython/minisgl/server/api_server.py. - Messaging: propagate
profilethrough tokenizer/backend messages (TokenizeMsg,UserMsg). - Scheduler/core: carry
profileinto the internalReqobject and start/stop profiling around the first in-flight request withprofile=True. - New utility:
python/minisgl/utils/profiler.pyimplementsRequestProfiler(start/stop/export).
How to use / test
# start server (your normal way)
# then:
curl -X POST http://127.0.0.1:1919/generate \
-H "Content-Type: application/json" \
-d '{"prompt":"hello","max_tokens":64,"profile":true}'
ls -lh /tmp/minisgl-profile-uid*.json
Thanks!
I'm wondering whether we should add a global scoped ProfileReq just like SGLang? I'm uncertain about the pros and cons of request-scoped profiler against a global-scoped one. Personally, I think using a seperate ProfileReq can offer better flexibility (because request-scoped profiler can be implemented in the client side on top of global-scope profiler).