sglang icon indicating copy to clipboard operation
sglang copied to clipboard

[Bug] The TopLogprob are same for each stream token

Open Snowdar opened this issue 8 months ago • 1 comments

Checklist

  • [x] 1. I have searched related issues but cannot get the expected help.
  • [x] 2. The bug has not been fixed in the latest version.
  • [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • [x] 5. Please use English, otherwise it will be closed.

Describe the bug

Use llama3-8b and rquest a "hello" and I get the first two chunks below:

ChatCompletionChunk(id='0bf6d1f81b4741399cf9c84a36e6d401', choices=[Choice(delta=ChoiceDelta(content='Hello', function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content=None), finish_reason=None, index=0, logprobs=ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=0.0, top_logprobs=[TopLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=0.0), TopLogprob(token='#', bytes=[35], logprob=-3.4028234663852886e+38), TopLogprob(token='!', bytes=[33], logprob=-3.4028234663852886e+38), TopLogprob(token='$', bytes=[36], logprob=-3.4028234663852886e+38), TopLogprob(token='"', bytes=[34], logprob=-3.4028234663852886e+38)])], refusal=None), matched_stop=None)], created=1745677065, model='model', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
ChatCompletionChunk(id='0bf6d1f81b4741399cf9c84a36e6d401', choices=[Choice(delta=ChoiceDelta(content='!', function_call=None, refusal=None, role=None, tool_calls=None, reasoning_content=None), finish_reason=None, index=0, logprobs=ChoiceLogprobs(content=[ChatCompletionTokenLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=0.0, top_logprobs=[TopLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=0.0), TopLogprob(token='#', bytes=[35], logprob=-3.4028234663852886e+38), TopLogprob(token='!', bytes=[33], logprob=-3.4028234663852886e+38), TopLogprob(token='$', bytes=[36], logprob=-3.4028234663852886e+38), TopLogprob(token='"', bytes=[34], logprob=-3.4028234663852886e+38)]), ChatCompletionTokenLogprob(token='!', bytes=[33], logprob=0.0, top_logprobs=[TopLogprob(token='Hello', bytes=[72, 101, 108, 108, 111], logprob=0.0), TopLogprob(token='#', bytes=[35], logprob=-3.4028234663852886e+38), TopLogprob(token='!', bytes=[33], logprob=-3.4028234663852886e+38), TopLogprob(token='$', bytes=[36], logprob=-3.4028234663852886e+38), TopLogprob(token='"', bytes=[34], logprob=-3.4028234663852886e+38)])], refusal=None), matched_stop=None)], created=1745677065, model='model', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)

The TopLogprob for the second token could not be found.

Reproduction

you could try llama model to test.

Environment

v0.4.5

Snowdar avatar Apr 26 '25 14:04 Snowdar

Fix this problem by appending n_prev_tokens[index] = n_prev_token to the next line of is_firsts[index] = is_first in anywhere (3 positions)

# sglang/python/sglang/srt/openai_api/adapter.py
...
is_firsts[index] = is_first
n_prev_tokens[index] = n_prev_token # added
...

Could you fix this minor bug by the way? @merrymercy

Snowdar avatar May 13 '25 03:05 Snowdar

Did you by any chance set top_p in the request? @Snowdar

CatherineSue avatar May 19 '25 04:05 CatherineSue

Did you by any chance set top_p in the request? @Snowdar

yes, set to 1 or any other value (<1)

Snowdar avatar May 20 '25 03:05 Snowdar