Apoorva Kulkarni
Apoorva Kulkarni
Look into vLLM under the hood for autoscaling, continuous batching basically efficiently scaling LLM inference. Use https://github.com/ray-project/llmperf for benchmarking.
Hi @manjarisri, thanks for the PR! Have you done any tests to make sure these stacks come up without any issues?
>Thanks for the great examples! I altered the jupyterhub on eks example (for a private cluster accessed via a Tailscale VPN) and I'm now adding a ray cluster and trying...
This is due to a mutating webhook introduced for LBC v2.5+. Per the docs... >The AWS LBC provides a mutating webhook for service resources to set the spec.loadBalancerClass field for...
I think this is a reasonable request. I will add it to our backlog.