[Bugfix] Fix off-by-one bug in decode_prompt_logprobs_inplace()
Skip the first token in sequence as its logprob is not defined and not computed. Otherwise, this token_id == all_token_ids[token_position] check won't work properly and cause detokenize_incrementally to generate incorrect results.
FIX #4904 FIX #4772 FIX #5334 FIX #5872
Hey! I am not sure whether skipping the first token would fix #5334. Have you tested the case in the issue? I think it is something specific to llama-2 as the llama-3 models seem to work correctly.
Hey! I am not sure whether skipping the first token would fix #5334. Have you tested the case in the issue? I think it is something specific to llama-2 as the llama-3 models seem to work correctly.
Let me try and reproduce your example. I am seeing exactly the same behavior as in your issue (long and repeated detokenzied text not in the vocab).
Hey! I am not sure whether skipping the first token would fix #5334. Have you tested the case in the issue? I think it is something specific to llama-2 as the llama-3 models seem to work correctly.
Let me try and reproduce your example. I am seeing exactly the same behavior as in your issue (long and repeated detokenzied text not in the vocab).
I think the PR should fix https://github.com/vllm-project/vllm/issues/5334.
Before
vLLM Completion text:
1. Carol has 2000 books in her
vLLM tokens: ['<s>', '', '\n', '1', '.', '\n Carol', '\n\n has', '\n\n\n ', '\n\n\n\n2', '\n\n\n\n\n0', '0', '0', ' books', ' in', ' her']
vLLM Completion logprobs: Logprobs(text_offset=[0, 3, 3, 4, 5, 6, 13, 19, 23, 28, 34, 35, 36, 42, 45], token_logprobs=[None, -4.107686996459961, -4.17
28644371032715, -5.700852870941162, -0.9313492774963379, -10.605775833129883, -6.336761951446533, -4.377373218536377, -1.8037347793579102, -1.64923667
90771484, -2.174774408340454, -2.6790499687194824, -2.831897497177124, -1.0957883596420288, -0.15247809886932373], tokens=['<s>', '', '\n', '1', '.',
'\n Carol', '\n\n has', '\n\n\n ', '\n\n\n\n2', '\n\n\n\n\n0', '0', '0', ' books', ' in', ' her'], top_logprobs=[None, {'': -4.107686996459961, 'Tags'
: -2.52565598487854}, {'\n': -4.1728644371032715, '1': -1.438489556312561}, {'1': -5.700852870941162, '\n': -1.1266340017318726}, {'.': -0.93134927749
63379}, {'\n Carol': -10.605775833129883, '\n The': -2.7600724697113037}, {'\n\n has': -6.336761951446533, '\n\nyn': -1.4324650764465332}, {'\n\n\n ':
-4.377373218536377, '\n\n\n a': -1.701591968536377}, {'\n\n\n\n2': -1.8037347793579102, '\n\n\n\n1': -1.3818597793579102}, {'\n\n\n\n\n0': -1.6492366
790771484}, {'0': -2.174774408340454}, {'0': -2.6790499687194824}, {' books': -2.831897497177124}, {' in': -1.0957883596420288}, {' her': -0.152478098
86932373}])
After
vLLM Completion text:
1. Carol has 2000 books in her
vLLM tokens: ['<s>', '', '\n', '1', '.', ' Carol', ' has', ' ', '2', '0', '0', '0', ' books', ' in', ' her']
vLLM Completion logprobs: Logprobs(text_offset=[0, 3, 3, 4, 5, 6, 12, 16, 17, 18, 19, 20, 21, 27, 30], token_logprobs=[None, -4.107686996459961, -4.1728644371032715, -5.700852870941162, -0.9313492774963379, -10.605775833129883, -6.336761951446533, -4.377373218536377, -1.8037347793579102, -1.6492366790771484, -2.174774408340454, -2.6790499687194824, -2.831897497177124, -1.0957883596420288, -0.15247809886932373], tokens=['<s>', '', '\n', '1', '.', ' Carol', ' has', ' ', '2', '0', '0', '0', ' books', ' in', ' her'], top_logprobs=[None, {'': -4.107686996459961, 'Tags': -2.52565598487854}, {'\n': -4.1728644371032715, '1': -1.438489556312561}, {'1': -5.700852870941162, '\n': -1.1266340017318726}, {'.': -0.9313492774963379}, {' Carol': -10.605775833129883, ' The': -2.7600724697113037}, {' has': -6.336761951446533, 'yn': -1.4324650764465332}, {' ': -4.377373218536377, ' a': -1.701591968536377}, {'2': -1.8037347793579102, '1': -1.3818597793579102}, {'0': -1.6492366790771484}, {'0': -2.174774408340454}, {'0': -2.6790499687194824}, {' books': -2.831897497177124}, {' in': -1.0957883596420288}, {' her': -0.15247809886932373}])
This is great, thanks! May I ask what do you think causes the issue? Just not skipping the first/bos token? But if so why does llama-3 not have this problem?
This is great, thanks! May I ask what do you think causes the issue? Just not skipping the first/bos token? But if so why does llama-3 not have this problem?
I updated the PR description trying to explain the issue.
I think it's not about any particular model (i saw the same with facebook/opt-125m). In the regression test and your example, there is something in common, one of top logprobs token happen to be an adjacent prompt token (3290 and '\n').
prompt_token_ids = [3290, 1562, ...]
top_logprobs = [None, {
1562: Logprob(logprob=0.0),
3290: Logprob(logprob=0.1)
}, {
8652: Logprob(logprob=0.0),
977: Logprob(logprob=0.1)
}, ...]
top_logprobs=[
None,
{'': -4.107686996459961, 'Tags': -2.52565598487854},
{'\n': -4.1728644371032715, '1': -1.438489556312561},
{'1': -5.700852870941162, '\n': -1.1266340017318726},
...
]
When it happens, it will trigger token_id == all_token_ids[token_position] and put detokenize_incrementally in a bad state.
In most cases, I think it will only unintentionally skip over 1 token and end up with correct results.
Im going to take a test case form here and add you as a co-author on the PR. I think my approach addresses the root cause whereas this addresses a symptom and could cause issues in the future
Closing this PR in prefer of https://github.com/vllm-project/vllm/pull/6223