vulcand Load-Balance Least connection

[enhancement] In our scenario, we have 2 instances of vulcand running traffic to a bunch of containers. Some of our containers can't handle that much traffic and with 2+ instances the risk of blocking a container is present. Hooray for birthday-theorem.

Would it be possible to implement cross-cluster least-connections load balancing?

I'm not sure etcd is fast enough to use as a backend for this.

Maybe if the instances had direct access to each other - UDP notifications could be a valid approach?

E. g.: When a connection is opened to an upstream - note this under the upstream and send notification to other instances. When a connection is closed - remove and notify to remove.

To mitigate dropped UDP-packets, non-local connections should have some timeout

If this was implemented, and the packets didn't arrive - it would just function as local least connections.

Comments/Ideas?

$fracklen avatar$ Oct 18 '14 23:10 fracklen

That would be possible, but hard enough to try thinking about some alternatives before implementing it.

Couple of ideas/questions

Scale of the problem

What are the absolute numbers - the amount of simultaneous connections and requests/second?

Leader/follower

I assume that you have two instances for HA, in this case would leader/follower pattern work? E.g. all traffic for one type of instances would go from one LB, and in case of failure of the leader, the follower would take over by taking over the IP for example

Tuning load balancing / connection options

I would like to learn the use case more, but it seems that you have overloaded instances with too many connections - what means that you'd need to add more instances and redirect load to them regardless of how many load balancers you have. The question is why that happens in the first case, can you elaborate on this more?

Oct 19 '14 18:10 klizhentas

Yes - we are running vulcand with two instances for HA. Yes, leader/follower pattern would work.

Scale of the problem

Running ~15 applications behind vulcand Every app is a docker container with ~3 unicorn workers Every app-container is running on 2 docker hosts - giving ~6 unicorn workers Sometimes we have 3 clients throwing a lot of traffic after one of the apps.

We are using registration services which do a health_check on regular intervals and update etcd accordingly.

Sometimes the traffic is so heavy, the health_check times out. - This pretty quickly takes the app out of service :)

If the traffic is evenly distributed, this shouldn't be a problem. But sometimes, all the connections are routed to one container.

We will scale the apps up with more unicorn workers, so a single container can handle the entire load + health_check. It just seems like a lot to have more than double of the needed capacity. We will also increase the registration service timeout - at the moment it's about 1 sec.

The above 2 will be my solution for the time being. I just see a lot of potential in the feature.

$fracklen avatar$ Oct 20 '14 14:10 fracklen

@klizhentas the irony is awesome... I had forgotten a client, which we have no control over: Mailgun :sleeping:

The application receives callbacks from Mailgun. I'm unsure of how many concurrent requests Mailgun Callbacks can fire for a single account. Maybe you know?

$fracklen avatar$ Oct 31 '14 05:10 fracklen

I can check and get back to you

Oct 31 '14 19:10 klizhentas

Load-Balance Least connection - with cluster

Scale of the problem

Leader/follower

Tuning load balancing / connection options

Scale of the problem