Varun Gupta comments

Results 87 comments of


                                            Varun Gupta

[RFC]: Support for Multi-Tenant Model Deployments and Tenant-Aware Routing in AIBrix

In proposed approach 1 and 2, isolation is only at gateway component but actual GPU resources are shared. Gateway is horizontally scalable, so I do not see a value for...

Empty LB_IP when try Quickstart for AMD ROCm Cluster

@AlexHe99 Please change service type for `envoy-aibrix-system-aibrix-eg-903790dc` from LoadBalancer to NodePort (@Jeffwan was referring to this service). You can revert back previous change made to model's service. --- Another hacky...

Empty LB_IP when try Quickstart for AMD ROCm Cluster

@AlexHe99 Just wanted to check how testing is going, feel free to raise any issues encountered. @lgy1027 From other issue, /v1/chat/completions works as expected, and now you are trying rate...

envoy gateway speed limit and queuing mechanism

@ying2025 on using routing-strategy, a target pod is selected and does not use httproute for request forwarding. For now, can you try request without routing-strategy, it will use the httproute...

[Feature]: Split deployment identifier from model deployment and add support for custom port for model deployment

- Goal is to split the deployment identifier and model-name. It is good feature to have but requires careful design consideration as it may create confusion for novice users. -...

feature: add simple session affinity plugins in gateway plugin

Can you describe how the workflow will be. - User directly sends "session-id": "1.1.1.1:8000" in first request OR user does first request then takes the backup of target-pod-address and then...

feature: add simple session affinity plugins in gateway plugin

To summarize, user directly starts with session-id header as UUID, from first request (reducing client burden to read session-id header from first request and applying to subsequent requests). For gateway,...

feature: add simple session affinity plugins in gateway plugin

Sounds good

feature: add simple session affinity plugins in gateway plugin

Overall LGTM. One nit comment to randomize fallback route and you can add documentation with a sample.

ModelAdapter seems to be working abnormally

Model adapter status is correct. From the last condition, status is bound and instances list the pod name on which lora adapter is loaded. Can you try to run inference...