Apoorva Kulkarni

Results 65 comments of Apoorva Kulkarni

Look into vLLM under the hood for autoscaling, continuous batching basically efficiently scaling LLM inference. Use https://github.com/ray-project/llmperf for benchmarking.

Hi @manjarisri, thanks for the PR! Have you done any tests to make sure these stacks come up without any issues?

>Thanks for the great examples! I altered the jupyterhub on eks example (for a private cluster accessed via a Tailscale VPN) and I'm now adding a ray cluster and trying...

This is due to a mutating webhook introduced for LBC v2.5+. Per the docs... >The AWS LBC provides a mutating webhook for service resources to set the spec.loadBalancerClass field for...

I think this is a reasonable request. I will add it to our backlog.