vllm [Bugfix] Fix off-by-one bug in decode_prompt_logprobs

Skip the first token in sequence as its logprob is not defined and not computed. Otherwise, this token_id == all_token_ids[token_position] check won't work properly and cause detokenize_incrementally to generate incorrect results.

FIX #4904 FIX #4772 FIX #5334 FIX #5872

Jun 25 '24 22:06 zifeitong

Hey! I am not sure whether skipping the first token would fix #5334. Have you tested the case in the issue? I think it is something specific to llama-2 as the llama-3 models seem to work correctly.

Jun 26 '24 04:06 fywalter

Hey! I am not sure whether skipping the first token would fix #5334. Have you tested the case in the issue? I think it is something specific to llama-2 as the llama-3 models seem to work correctly.

Let me try and reproduce your example. I am seeing exactly the same behavior as in your issue (long and repeated detokenzied text not in the vocab).

Jun 26 '24 04:06 zifeitong

Hey! I am not sure whether skipping the first token would fix #5334. Have you tested the case in the issue? I think it is something specific to llama-2 as the llama-3 models seem to work correctly.

Let me try and reproduce your example. I am seeing exactly the same behavior as in your issue (long and repeated detokenzied text not in the vocab).

I think the PR should fix https://github.com/vllm-project/vllm/issues/5334.

Before

vLLM Completion text:
1. Carol has 2000 books in her
vLLM tokens: ['<s>', '', '\n', '1', '.', '\n Carol', '\n\n has', '\n\n\n ', '\n\n\n\n2', '\n\n\n\n\n0', '0', '0', ' books', ' in', ' her']
vLLM Completion logprobs: Logprobs(text_offset=[0, 3, 3, 4, 5, 6, 13, 19, 23, 28, 34, 35, 36, 42, 45], token_logprobs=[None, -4.107686996459961, -4.17
28644371032715, -5.700852870941162, -0.9313492774963379, -10.605775833129883, -6.336761951446533, -4.377373218536377, -1.8037347793579102, -1.64923667
90771484, -2.174774408340454, -2.6790499687194824, -2.831897497177124, -1.0957883596420288, -0.15247809886932373], tokens=['<s>', '', '\n', '1', '.',
'\n Carol', '\n\n has', '\n\n\n ', '\n\n\n\n2', '\n\n\n\n\n0', '0', '0', ' books', ' in', ' her'], top_logprobs=[None, {'': -4.107686996459961, 'Tags'
: -2.52565598487854}, {'\n': -4.1728644371032715, '1': -1.438489556312561}, {'1': -5.700852870941162, '\n': -1.1266340017318726}, {'.': -0.93134927749
63379}, {'\n Carol': -10.605775833129883, '\n The': -2.7600724697113037}, {'\n\n has': -6.336761951446533, '\n\nyn': -1.4324650764465332}, {'\n\n\n ':
 -4.377373218536377, '\n\n\n a': -1.701591968536377}, {'\n\n\n\n2': -1.8037347793579102, '\n\n\n\n1': -1.3818597793579102}, {'\n\n\n\n\n0': -1.6492366
790771484}, {'0': -2.174774408340454}, {'0': -2.6790499687194824}, {' books': -2.831897497177124}, {' in': -1.0957883596420288}, {' her': -0.152478098
86932373}])

After

vLLM Completion text:
1. Carol has 2000 books in her
vLLM tokens: ['<s>', '', '\n', '1', '.', ' Carol', ' has', ' ', '2', '0', '0', '0', ' books', ' in', ' her']
vLLM Completion logprobs: Logprobs(text_offset=[0, 3, 3, 4, 5, 6, 12, 16, 17, 18, 19, 20, 21, 27, 30], token_logprobs=[None, -4.107686996459961, -4.1728644371032715, -5.700852870941162, -0.9313492774963379, -10.605775833129883, -6.336761951446533, -4.377373218536377, -1.8037347793579102, -1.6492366790771484, -2.174774408340454, -2.6790499687194824, -2.831897497177124, -1.0957883596420288, -0.15247809886932373], tokens=['<s>', '', '\n', '1', '.', ' Carol', ' has', ' ', '2', '0', '0', '0', ' books', ' in', ' her'], top_logprobs=[None, {'': -4.107686996459961, 'Tags': -2.52565598487854}, {'\n': -4.1728644371032715, '1': -1.438489556312561}, {'1': -5.700852870941162, '\n': -1.1266340017318726}, {'.': -0.9313492774963379}, {' Carol': -10.605775833129883, ' The': -2.7600724697113037}, {' has': -6.336761951446533, 'yn': -1.4324650764465332}, {' ': -4.377373218536377, ' a': -1.701591968536377}, {'2': -1.8037347793579102, '1': -1.3818597793579102}, {'0': -1.6492366790771484}, {'0': -2.174774408340454}, {'0': -2.6790499687194824}, {' books': -2.831897497177124}, {' in': -1.0957883596420288}, {' her': -0.15247809886932373}])

Jun 26 '24 05:06 zifeitong

This is great, thanks! May I ask what do you think causes the issue? Just not skipping the first/bos token? But if so why does llama-3 not have this problem?

Jun 26 '24 05:06 fywalter

This is great, thanks! May I ask what do you think causes the issue? Just not skipping the first/bos token? But if so why does llama-3 not have this problem?

I updated the PR description trying to explain the issue.

I think it's not about any particular model (i saw the same with facebook/opt-125m). In the regression test and your example, there is something in common, one of top logprobs token happen to be an adjacent prompt token (3290 and '\n').

    prompt_token_ids = [3290, 1562, ...]
    top_logprobs = [None, {
        1562: Logprob(logprob=0.0),
        3290: Logprob(logprob=0.1)
    }, {
        8652: Logprob(logprob=0.0),
        977: Logprob(logprob=0.1)
    }, ...]

top_logprobs=[
  None,
  {'': -4.107686996459961, 'Tags': -2.52565598487854},
  {'\n': -4.1728644371032715, '1': -1.438489556312561},
  {'1': -5.700852870941162, '\n': -1.1266340017318726},
  ...
]

When it happens, it will trigger token_id == all_token_ids[token_position] and put detokenize_incrementally in a bad state.

In most cases, I think it will only unintentionally skip over 1 token and end up with correct results.

Jun 26 '24 16:06 zifeitong

Im going to take a test case form here and add you as a co-author on the PR. I think my approach addresses the root cause whereas this addresses a symptom and could cause issues in the future

Jul 08 '24 23:07 robertgshaw2-redhat

Closing this PR in prefer of https://github.com/vllm-project/vllm/pull/6223

Jul 09 '24 18:07 zifeitong

[Bugfix] Fix off-by-one bug in decode_prompt_logprobs_inplace()