ome icon indicating copy to clipboard operation
ome copied to clipboard

Inference traffic load balancing

Open justinSmileDate opened this issue 5 months ago • 4 comments

Thank you for your valuable work! I would like to know what is the design idea of ​​the inference traffic load balancing and where is the code.

justinSmileDate avatar Jul 01 '25 09:07 justinSmileDate

@justinSmileDate Thanks for the interest. By default we take advantage of the k8s service to distribute the inference traffic. Additionally we do have a more complicated design of and support for difference ingress providers.
Please take a look at this for more detail. https://github.com/sgl-project/ome/tree/main/pkg/controller/v1beta1/inferenceservice/reconcilers/ingress

YouNeedCryDear avatar Jul 01 '25 18:07 YouNeedCryDear

depending on the vendor out of the box, it uses k8s round robin, in our runtimes we use sgl router for better load balancing as well as for pd load balancing

slin1237 avatar Jul 01 '25 20:07 slin1237

@justinSmileDate Thanks for the interest. By default we take advantage of the k8s service to distribute the inference traffic. Additionally we do have a more complicated design of and support for difference ingress providers. Please take a look at this for more detail. https://github.com/sgl-project/ome/tree/main/pkg/controller/v1beta1/inferenceservice/reconcilers/ingress

Thank you for your professional reply! In fact, I want to understand how 'opts' is generated. I think the generation of 'opts' should include the strategy of traffic distribution. I don't see any part about traffic distribution in the project. Can you tell me where it is?

The ‘opts’ code is as follows: https://github.com/sgl-project/ome/blob/main/pkg/controller/v1beta1/inferenceservice/reconcilers/ingress/reconciler.go#L90

justinSmileDate avatar Jul 02 '25 06:07 justinSmileDate

depending on the vendor out of the box, it uses k8s round robin, in our runtimes we use sgl router for better load balancing as well as for pd load balancing

Thank you for your professional reply! I roughly understand the routing strategy, but I want to know, at the “front door”, what is the strategy for sending traffic to different pods? Is the k8s scheduling strategy used?

justinSmileDate avatar Jul 02 '25 06:07 justinSmileDate