sglang
sglang copied to clipboard
[Bug] The TopLogprob are same for each stream token
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.
Describe the bug
Use llama3-8b and rquest a "hello" and I get the first two chunks below:
ChatCompletionChunk(id='0bf6d1f81b4741399cf9c84a36e6d401', choices=[Choice(delta=ChoiceDelta(content='Hello', function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content=None), finish_reason=None, index=0, logprobs=ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=0.0, top_logprobs=[TopLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=0.0), TopLogprob(token='#', bytes=[35], logprob=-3.4028234663852886e+38), TopLogprob(token='!', bytes=[33], logprob=-3.4028234663852886e+38), TopLogprob(token='$', bytes=[36], logprob=-3.4028234663852886e+38), TopLogprob(token='"', bytes=[34], logprob=-3.4028234663852886e+38)])], refusal=None), matched_stop=None)], created=1745677065, model='model', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
ChatCompletionChunk(id='0bf6d1f81b4741399cf9c84a36e6d401', choices=[Choice(delta=ChoiceDelta(content='!', function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content=None), finish_reason=None, index=0, logprobs=ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=0.0, top_logprobs=[TopLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=0.0), TopLogprob(token='#', bytes=[35], logprob=-3.4028234663852886e+38), TopLogprob(token='!', bytes=[33], logprob=-3.4028234663852886e+38), TopLogprob(token='$', bytes=[36], logprob=-3.4028234663852886e+38), TopLogprob(token='"', bytes=[34], logprob=-3.4028234663852886e+38)]), ChatCompletionTokenLogprob(token='!', bytes=[33], logprob=0.0, top_logprobs=[TopLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=0.0), TopLogprob(token='#', bytes=[35], logprob=-3.4028234663852886e+38), TopLogprob(token='!', bytes=[33], logprob=-3.4028234663852886e+38), TopLogprob(token='$', bytes=[36], logprob=-3.4028234663852886e+38), TopLogprob(token='"', bytes=[34], logprob=-3.4028234663852886e+38)])], refusal=None), matched_stop=None)], created=1745677065, model='model', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The TopLogprob for the second token could not be found.
Reproduction
you could try llama model to test.
Environment
v0.4.5
Fix this problem by appending n_prev_tokens[index] = n_prev_token to the next line of is_firsts[index] = is_first in anywhere (3 positions)
# sglang/python/sglang/srt/openai_api/adapter.py
...
is_firsts[index] = is_first
n_prev_tokens[index] = n_prev_token # added
...
Could you fix this minor bug by the way? @merrymercy
Did you by any chance set top_p in the request? @Snowdar