client icon indicating copy to clipboard operation
client copied to clipboard

Sticky Load Balancing during inference not hitting all the Triton Servers sitting behind the Load Balancer

Open animikhaich opened this issue 11 months ago • 1 comments

I have an architecture that looks like this:

Image

The main issue is that the load balancer is not distributing the traffic across the different server VMs. Instead at any given time I only see 40-50% of the VMs under full load being used. I am closing the gRPC client after each request using client.close(). However, it seems that session is still sticky, which is likely the root cause.

I am using Azure Load Balancer.

Would love to know if anyone has faced this issue and resolved it. If so, how?

animikhaich avatar Feb 13 '25 16:02 animikhaich

+1 this is a big issue, how has this not been responded to at all?

patches11 avatar Sep 24 '25 23:09 patches11