Varun Gupta comments

Results 87 comments of


                                            Varun Gupta

Increase gateway request timeout from 120s to 1800s

We will leave base case to be 120s and provide an option for user to change [config using production overlay](https://github.com/vllm-project/aibrix/issues/847).

[FIX]: router constructor config init, add gw-plugin metrics server, simplify & enable vtc e2e test

@Venkat2811 Integration test [failed](https://github.com/vllm-project/aibrix/actions/runs/15141937232/job/42568252462?pr=1112), can you please check the error.

Expose gateway error message to external

After this PR: https://github.com/aibrix/aibrix/pull/703, error messages returned from engine will be exposed to user with same error code. Ignore the twice logging for Unauthorized, thats for debugging.

Run small models with vLLM CPU mode for local development testing

We can close this issue: https://github.com/vllm-project/aibrix/tree/main/development/vllm

streaming mode doesn't work for in-house engine

I checked the json string conversion, that works. From the error, connection is closed before stream could end.

streaming mode doesn't work for in-house engine

On further debugging, root cause is the message format during streaming. Expected response response_body_1 data: {ChatCompletionChunk}\n\n response_body_2 data: {ChatCompletionChunk}\n\n data: {ChatCompletionChunk}\n\n response_body_n data: [DONE] Actual response response_body_1 {ChatCompletionChunk}\n\n response_body_2 {ChatCompletionChunk}...

port in annotation is not picked by gateway pod

Do you have the yaml by chance? Because this is not an issue.

[batch] Currently only /v1/chat/completions is tested. We need to expand to `/v1/completions` or `/v1/embeddings` and `/v1/moderation` at least

Integration testing?

[RFC]: Support for Multi-Tenant Model Deployments and Tenant-Aware Routing in AIBrix

@ModiIntel I will ping you on slack and schedule sometime to discuss different alternatives. Another option can be to split deployment identifier from the model name. cc https://github.com/vllm-project/aibrix/issues/1086 - Each...

[RFC]: Support for Multi-Tenant Model Deployments and Tenant-Aware Routing in AIBrix

had a discussion with @ModiIntel, and a deployment/tenant identifier raised in this issue https://github.com/vllm-project/aibrix/issues/1086 can help resolve multi-tenancy use as well. @ModiIntel will help drive the design and implementation. From...