Varun Gupta
Varun Gupta
### 🚀 Feature Description and Motivation Right now heterogenous feature such as optimizer or request tracing is enabled by default. Enabling by default has two issues 1) It add small...
## Pull Request Description Right now if routing strategy is enabled, then gateway processes request message for prefix cache aware routing. In this case, add "prompt" request input along with...
## Pull Request Description Right now `make deploy` command uses `kubectl create` because ray related CRDs size is pretty big to support `kubectl apply`. Since envoy proxy CRD is also...
## Pull Request Description while running benchmark tests I came across scenario where if token length is more than 4k and QPS > 100, I was connection timeout errors. Increasing...
## Pull Request Description [Please provide a clear and concise description of your changes here] ## Related Issues Resolves: #[Insert issue number(s)] **Important: Before submitting, please complete the description above...
### 🚀 Feature Description and Motivation The gateway currently caches the input request along with the pod it was routed to. This cached information is then used to perform prefix...
### 🚀 Feature Description and Motivation Add support for embedding API ### Use Case NA ### Proposed Solution _No response_
### 🐛 Describe the bug I noticed that metrics are not refreshed correctly per the interval. In below logs, interval set is 1s, but for decode-0 pod, metric is refreshed...
### 🚀 Feature Description and Motivation for multi-modality usecase, add option to download image or video from remote storage such as S3. We can leverage storage API: https://github.com/vllm-project/aibrix/tree/main/python/aibrix/aibrix/storage to download...
## Pull Request Description Create an image or video ``` - kubectl -n envoy-gateway-system port-forward service/envoy-aibrix-system-aibrix-eg-903790dc 8888:80 - curl -v "http://localhost:8888/v1/video/generations" \ -H "Content-Type: application/json" \ -d '{ "prompt": "a...