streaming mode doesn't work for in-house engine
🐛 Describe the bug
{"id": "chatcmpl-1753420585601835809", "choices": [{"index": 0, "delta": {"role": "assistant", "content": "", "reasoning_content": "", "tool_calls": []}, "finish_reason": "stop"}], "created": 1753420585, "model": "deepseek-r1", "system_fingerprint": "fp", "object": "chat.completion.chunk", "usage": {"prompt_tokens": 5, "completion_tokens": 40, "total_tokens": 45, "completion_tokens_details": {"reasoning_tokens": 0}}}
seems reasoning_content,tool_calls is not the standard output.
If user hit the engine directly, everything works fine
Steps to Reproduce
xllm + aibrix
Expected behavior
it should work as expected
Environment
nightly
this is multi-engine related
I checked the json string conversion, that works. From the error, connection is closed before stream could end.
On further debugging, root cause is the message format during streaming.
Expected response
response_body_1
data: {ChatCompletionChunk}\n\n
response_body_2
data: {ChatCompletionChunk}\n\n data: {ChatCompletionChunk}\n\n
response_body_n data: [DONE]
Actual response
response_body_1
{ChatCompletionChunk}\n\n
response_body_2
{ChatCompletionChunk} {ChatCompletionChunk}\n\n
response_body_n [DONE]
Due to difference in response formatting, aibrix gateway plugin fails to parse it.
/cc @happyandslow