Varun Gupta
Varun Gupta
We will leave base case to be 120s and provide an option for user to change [config using production overlay](https://github.com/vllm-project/aibrix/issues/847).
@Venkat2811 Integration test [failed](https://github.com/vllm-project/aibrix/actions/runs/15141937232/job/42568252462?pr=1112), can you please check the error.
After this PR: https://github.com/aibrix/aibrix/pull/703, error messages returned from engine will be exposed to user with same error code. Ignore the twice logging for Unauthorized, thats for debugging.
We can close this issue: https://github.com/vllm-project/aibrix/tree/main/development/vllm
I checked the json string conversion, that works. From the error, connection is closed before stream could end.
On further debugging, root cause is the message format during streaming. Expected response response_body_1 data: {ChatCompletionChunk}\n\n response_body_2 data: {ChatCompletionChunk}\n\n data: {ChatCompletionChunk}\n\n response_body_n data: [DONE] Actual response response_body_1 {ChatCompletionChunk}\n\n response_body_2 {ChatCompletionChunk}...
Do you have the yaml by chance? Because this is not an issue.
Integration testing?
@ModiIntel I will ping you on slack and schedule sometime to discuss different alternatives. Another option can be to split deployment identifier from the model name. cc https://github.com/vllm-project/aibrix/issues/1086 - Each...
had a discussion with @ModiIntel, and a deployment/tenant identifier raised in this issue https://github.com/vllm-project/aibrix/issues/1086 can help resolve multi-tenancy use as well. @ModiIntel will help drive the design and implementation. From...