Varun Gupta

Results 87 comments of Varun Gupta

In proposed approach 1 and 2, isolation is only at gateway component but actual GPU resources are shared. Gateway is horizontally scalable, so I do not see a value for...

@AlexHe99 Please change service type for `envoy-aibrix-system-aibrix-eg-903790dc` from LoadBalancer to NodePort (@Jeffwan was referring to this service). You can revert back previous change made to model's service. --- Another hacky...

@AlexHe99 Just wanted to check how testing is going, feel free to raise any issues encountered. @lgy1027 From other issue, /v1/chat/completions works as expected, and now you are trying rate...

@ying2025 on using routing-strategy, a target pod is selected and does not use httproute for request forwarding. For now, can you try request without routing-strategy, it will use the httproute...

- Goal is to split the deployment identifier and model-name. It is good feature to have but requires careful design consideration as it may create confusion for novice users. -...

Can you describe how the workflow will be. - User directly sends "session-id": "1.1.1.1:8000" in first request OR user does first request then takes the backup of target-pod-address and then...

To summarize, user directly starts with session-id header as UUID, from first request (reducing client burden to read session-id header from first request and applying to subsequent requests). For gateway,...

Overall LGTM. One nit comment to randomize fallback route and you can add documentation with a sample.

Model adapter status is correct. From the last condition, status is bound and instances list the pod name on which lora adapter is loaded. Can you try to run inference...