vllm
vllm copied to clipboard
[BugFix] Fix completion_stream_generator return two stops (#2266)
fix #2266
Hi @allenhaozi, thanks for submitting the PR. According to my understanding, neither the current main branch nor this PR exactly implements the OpenAI API's response format. This PR omits the last generated token. Could you please fix this?
- OpenAI API:
data: {"id":"cmpl-8d5HgThgqr3rCPyDdnr4luTSlaT1A","object":"text_completion","created":1704325796,"choices":[{"text":" each","index":0,"logprobs":null,"finish_reason":null}],"model":"davinci-002"}
data: {"id":"cmpl-8d5HgThgqr3rCPyDdnr4luTSlaT1A","object":"text_completion","created":1704325796,"choices":[{"text":" neighborhood","index":0,"logprobs":null,"finish_reason":"length"}],"model":"davinci-002"}
data: [DONE]
- Current main:
data: {"id": "cmpl-3ffa8be79267457f9cfc139696be5905", "created": 64863, "model": "meta-llama/Llama-2-7b-hf", "choices": [{"index": 0, "text": " many", "logprobs": null, "finish_reason": "length"}]}
data: {"id": "cmpl-3ffa8be79267457f9cfc139696be5905", "created": 64863, "model": "meta-llama/Llama-2-7b-hf", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": "length"}], "usage": {"prompt_tokens": 5, "total_tokens": 12, "completion_tokens": 7}}
data: [DONE]
- This PR:
data: {"id": "cmpl-3473906e66f0494dbf6abf3d6e807222", "created": 65102, "model": "meta-llama/Llama-2-7b-hf", "choices": [{"index": 0, "text": " its", "logprobs": null, "finish_reason": null}]}
data: {"id": "cmpl-3473906e66f0494dbf6abf3d6e807222", "created": 65102, "model": "meta-llama/Llama-2-7b-hf", "choices": [{"index": 0, "text": "", "logprobs": null, "finish_reason": "length"}], "usage": {"prompt_tokens": 5, "total_tokens": 12, "completion_tokens": 7}}
data: [DONE]
Replaced by #3450. Thank you for finding the bug providing the solution initially!