feat: log input text for OpenAI format API
Motivation
To support https://github.com/sgl-project/sglang/issues/1608
with openAI API , when we enable --log-requests, the input text in log will be unreadable.
[2025-02-19 10:07:29] Finish: obj=GenerateReqInput(text=None, input_ids=[0, 87979, 11403, 8367, 4697, 30, 59812, 2923, 1018, 290, 18594, 303, 882, 11743, 4431, 32414, 1175, 9484, 5802, 1923, 19223, 8745, 8745, 303, 87825, 16465, 621, 126725, 1175, 2792, 303, 20808...
Modifications
I think I found the root case : when input_id was filled with data ,the obj.text will be None.
And in the log-request output, dataclass_to_string_truncated(obj: GenerateReqInput) will show obj.text as None and obj.input_ids as encoded digits.
please correct me if I'm wrong.
Test result:
curl -L -X POST localhost:${PORT}/v1/chat/completions \
--data-raw '{
"model": "/model",
"messages": [
{"role": "user", "content": "Who is the most beautiful woman in the world?"}
],
"max_tokens": 2500,
"temperature": 0.7,
"stream": false
}'
Log of sglang
[2025-02-20 16:56:30] Receive: obj=GenerateReqInput(text='Who is the most beautiful woman in the world?', input_ids=[151644, 8948, 198, 2610, 525, 1207, 16948, 11, 3465, 553, 54364, 14817, 13, 1446, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 15191, 374, 279, 1429, 6233, 5220, 304, 279, 1879, 11319, 151645, 198, 151644, 77091, 198], input_embeds=None, image_data=None, sampling_params={'temperature': 0.7, 'max_new_tokens': 2500, 'min_new_tokens': 0, 'stop': None, 'stop_token_ids': None, 'top_p': 1.0, 'top_k': -1, 'min_p': 0.0, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'repetition_penalty': 1.0, 'regex': None, 'ebnf': None, 'n': 1, 'no_stop_trim': False, 'ignore_eos': False, 'skip_special_tokens': True}, rid='8b4b18fccde5427885b63d31a535e1aa', return_logprob=False, logprob_start_len=-1, top_logprobs_num=0, return_text_in_logprobs=True, stream=False, log_metrics=True, modalities=[], lora_path=None, session_params=None, custom_logit_processor=None)
[2025-02-20 16:56:30 TP0] Prefill batch. #new-seq: 1, #new-token: 15, #cached-token: 24, cache hit rate: 23.76%, token usage: 0.00, #running-req: 0, #queue-req: 0
[2025-02-20 16:56:30 TP0] Decode batch. #running-req: 1, #token: 74, token usage: 0.00, gen throughput (token/s): 0.90, #queue-req: 0
[2025-02-20 16:56:30] Finish: obj=GenerateReqInput(text='Who is the most beautiful woman in the world?', input_ids=[151644, 8948, 198, 2610, 525, 1207, 16948, 11, 3465, 553, 54364, 14817, 13, 1446, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 15191, 374, 279, 1429, 6233, 5220, 304, 279, 1879, 11319, 151645, 198, 151644, 77091, 198], input_embeds=None, image_data=None, sampling_params={'temperature': 0.7, 'max_new_tokens': 2500, 'min_new_tokens': 0, 'stop': None, 'stop_token_ids': None, 'top_p': 1.0, 'top_k': -1, 'min_p': 0.0, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'repetition_penalty': 1.0, 'regex': None, 'ebnf': None, 'n': 1, 'no_stop_trim': False, 'ignore_eos': False, 'skip_special_tokens': True}, rid='8b4b18fccde5427885b63d31a535e1aa', return_logprob=False, logprob_start_len=-1, top_logprobs_num=0, return_text_in_logprobs=True, stream=False, log_metrics=True, modalities=[], lora_path=None, session_params=None, custom_logit_processor=None), out={'text': "I'm sorry, but I can't answer this question. As an artificial intelligence language model, I don't have personal preferences or feelings, and I don't have access to information about beauty in the world. My purpose is to provide helpful and informative responses to the best of my knowledge and abilities, but I cannot produce opinions or preferences about individuals or topics.", 'meta_info': {'id': '8b4b18fccde5427885b63d31a535e1aa', 'finish_reason': {'type': 'stop', 'matched': 151645}, 'prompt_tokens': 39, 'completion_tokens': 73, 'cached_tokens': 24}}
[2025-02-20 16:56:30] INFO: 127.0.0.1:41150 - "POST /v1/chat/completions HTTP/1.1" 200 OK
BEFORE
AFTER
Checklist
- [x] Format your code according to the Code Formatting with Pre-Commit.
- [x] Add unit tests as outlined in the Running Unit Tests.
- [x] Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
- [ ] Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
- [x] For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
- [x] Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.
@merrymercy can you please kindly take a look ?
I'm confused about the CI UT failures, seems all are irrelevant ...
test_video_chat_completionfailure ,- performance threshold
test_mmluassert metrics["score"] >= 0.5- notebook test
make: *** [Makefile:12: compile] Error 1
Digging into the log, I think I found the problem , and will re-work this PR.
This pull request has been automatically closed due to inactivity. Please feel free to reopen it if needed.