vllm [BugFix] Fix completion_stream_generator return two stops (#2266)

fix #2266

Dec 26 '23 05:12 allenhaozi

Hi @allenhaozi, thanks for submitting the PR. According to my understanding, neither the current main branch nor this PR exactly implements the OpenAI API's response format. This PR omits the last generated token. Could you please fix this?

OpenAI API:

data: {"id":"cmpl-8d5HgThgqr3rCPyDdnr4luTSlaT1A","object":"text_completion","created":1704325796,"choices":[{"text":" each","index":0,"logprobs":null,"finish_reason":null}],"model":"davinci-002"}

data: {"id":"cmpl-8d5HgThgqr3rCPyDdnr4luTSlaT1A","object":"text_completion","created":1704325796,"choices":[{"text":" neighborhood","index":0,"logprobs":null,"finish_reason":"length"}],"model":"davinci-002"}

data: [DONE]

Current main:

data: {"id": "cmpl-3ffa8be79267457f9cfc139696be5905", "created": 64863, "model": "meta-llama/Llama-2-7b-hf", "choices": [{"index": 0, "text": " many", "logprobs": null, "finish_reason": "length"}]}

data: {"id": "cmpl-3ffa8be79267457f9cfc139696be5905", "created": 64863, "model": "meta-llama/Llama-2-7b-hf", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": "length"}], "usage": {"prompt_tokens": 5, "total_tokens": 12, "completion_tokens": 7}}

data: [DONE]

This PR:

data: {"id": "cmpl-3473906e66f0494dbf6abf3d6e807222", "created": 65102, "model": "meta-llama/Llama-2-7b-hf", "choices": [{"index": 0, "text": " its", "logprobs": null, "finish_reason": null}]}

data: {"id": "cmpl-3473906e66f0494dbf6abf3d6e807222", "created": 65102, "model": "meta-llama/Llama-2-7b-hf", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": "length"}], "usage": {"prompt_tokens": 5, "total_tokens": 12, "completion_tokens": 7}}

data: [DONE]

Jan 04 '24 00:01 WoosukKwon

Replaced by #3450. Thank you for finding the bug providing the solution initially!

Mar 25 '24 17:03 simon-mo