Thomas Jack Carroll
Thomas Jack Carroll
One use-case I'd love to see supported as a tenant-aware optimization is tenant-based LoRA adapters.
The [Quickstart Model Sample](https://github.com/vllm-project/aibrix/blob/main/samples/quickstart/model.yaml) already includes checks, but they are too tight for the current model download. 120 seconds is not enough. Going to log an issue and will link...
I'm willing to take this up
# Research on Potential Kustomize Version Issues Kustomize added support for helm charts in [v4.1.0](https://github.com/kubernetes-sigs/kustomize/releases/tag/kustomize%2Fv4.1.0). My current kubectl client (found with `kubectl version`) is built with kustomize v5.5.0. Our CI...
## Another Potential Issue Regarding Usage The [docs](https://github.com/kubernetes-sigs/kustomize/blob/master/examples/chart.md#but-its-not-really-about-performance) for Kustomize state: > Although the helm related fields discussed above are handy for experimentation and development, it's best to avoid them...
Good news, kubebuilder has added a plugin (currently in alpha) for helm. See [here](https://book.kubebuilder.io/plugins/available/helm-v1-alpha). Might be worth revisiting this issue.
@Jeffwan would also be nice to be able to schedule gateway (and control plane) on separate nodes from core GPU nodes running the vLLM pods because gateway workloads are largely...
Just ran into this on MicroK8s as well. The kube API has a hard limit of 256kb for annotations, and client-side apply adds an annotation for the last version, which...
### Before  ### After 