aibrix streaming mode doesn't work for in-house engine

🐛 Describe the bug

{"id": "chatcmpl-1753420585601835809", "choices": [{"index": 0, "delta": {"role": "assistant", "content": "", "reasoning_content": "", "tool_calls": []}, "finish_reason": "stop"}], "created": 1753420585, "model": "deepseek-r1", "system_fingerprint": "fp", "object": "chat.completion.chunk", "usage": {"prompt_tokens": 5, "completion_tokens": 40, "total_tokens": 45, "completion_tokens_details": {"reasoning_tokens": 0}}}

seems reasoning_content，tool_calls is not the standard output.

If user hit the engine directly, everything works fine

Steps to Reproduce

xllm + aibrix

Expected behavior

it should work as expected

Environment

nightly

Jul 28 '25 21:07 Jeffwan

this is multi-engine related

Jul 28 '25 21:07 Jeffwan

I checked the json string conversion, that works. From the error, connection is closed before stream could end.

Jul 30 '25 19:07 varungup90

On further debugging, root cause is the message format during streaming.

Expected response

response_body_1

data: {ChatCompletionChunk}\n\n

response_body_2

data: {ChatCompletionChunk}\n\n data: {ChatCompletionChunk}\n\n

response_body_n data: [DONE]

Actual response

response_body_1

{ChatCompletionChunk}\n\n

response_body_2

{ChatCompletionChunk} {ChatCompletionChunk}\n\n

response_body_n [DONE]

Due to difference in response formatting, aibrix gateway plugin fails to parse it.

Aug 03 '25 22:08 varungup90

/cc @happyandslow

Aug 05 '25 20:08 Jeffwan