undici Improve BalancedPool balancing algorithm

trafficstars

Currently BalancedPool use a simple round robin algorithm if the target upstream is not available.

A more nuanced algorithm could easily produce better performance.

This is a good paper describing a better algorithm: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.863.3985&rep=rep1&type=pdf.

Oct 18 '21 10:10 mcollina

HI @mcollina

I'd like to work on this one. Can you assign me to it?

Oct 18 '21 18:10 jodevsa

go for it!

Oct 18 '21 19:10 mcollina

@mcollina what is the reason behind introducing the balancedPool? Is it to support distributing the requests to multiple upstreams?

Oct 19 '21 12:10 jodevsa

Yes exactly. There can be a few situations where deploying a separate load balancer in front of a pool of services is not an option.

Oct 19 '21 12:10 mcollina

Do you already have an idea what the algorithm should do?

I took a brief look into the paper; the algorithm that is used to decide the weights assumes that we are aware of the different request classes. This isn't the case in our situation, right? I'll also dig more into the paper in the following days

I'm also taking a look at this: https://github.com/elastic/elastic-transport-js/blob/main/src/pool/WeightedConnectionPool.ts

It looks like they start each connection with the max weight, and then decrease the weight if the connection errored or TO.

They also have an option to decrease the weight if the http response code is 502, 503, or 504

Oct 19 '21 19:10 jodevsa

That seems a really good start!

Oct 19 '21 19:10 mcollina

@delvedor can surely help reviewing.

Oct 19 '21 19:10 mcollina

Hi guys, I finally have a vacation so hopefully I'll be able to wrap this thing 🙏 I was stuck at testing and currently working on rebasing after the new factory thingy changes.

I need some help. So I've been writing tests and I noticed a behaviour in the algorithm where one server would be chosen for so many times. I'll try to explain it via an example:

Let us assume that we have 2 upstreams A, and B, with initial weights of 50 for both. Also let us assume that our dynamic weights logic would have a penalty of 19 points if the upstream request fails. if B fails the weight of B would be 31 and the weight of A would stay 50. the current weight would be around 50 and the algorithm would decrease that weight every N iterations by the GCD which is "1" in our case GCD(50,31). So then the current weight is decreased every N iterations (2 in our case) by the GCD. This would result in the algorithm choosing server A for more than 78 times (((50-11)/1) *2 until it shifts back to server B.

one solution here might be to choose a penalty where P % max_weight == 0. something like 10? and then we'll get a GCD of 10 so B would be choose after 2 iterations ((50-10)/10)*2

It would be very helpful if someone else can validate my findings and check if this is an acceptable behaviour. I'm also up for a call if anyone would like to discuss this further :)

Jul 05 '22 21:07 jodevsa

I think 78 times is not that bad from my understanding.

If we are under low load, always picking the same server is not a problem. If we are under high load, 78 times does not look to be a long wait, less than a second if we are sending around 100 req/sec.

How does the math look like for like 3 or 5 servers?

Jul 05 '22 21:07 mcollina

I've written a test runner part of the balanced-pool.js tests to be able to easily test the algorithm. Please note the results vary according to the starting weight per server and the error penalty.

Here is a successful test with 3 servers with starting weight of 100 and error penalty of 7 and server A failing on the first request

  {
   // number of requests to sent using the undici client. 
    iterations: 100,
    startingWeightPerServer:100,
   // the error penality incase server was down or we got a socket hangup
    errorPenalty: 7,
    config: [{ server: 'A', simulateUpstreamDownOnIterations: [0] }, { server: 'B' }, { server: 'C' }],
   // sequence of successful  requests. Notice that A failed on iteration 0 because we configured it with simulateUpstreamDownOnIterations on 0 and the algorithim picked it again after 14 requests
    expected: ['B', 'C', 'B','C','B','C','B','C','B','C','B','C','B','C','A','B','C','A','B','C'],
   // the number of connection refused errors that the undici client faced. this is expected as we configured server A to simulate an upstream down on iteration 0 
    expectedConnectionRefusedErrors: 1,
    expectedSocketErrors: 0,
    // out of the 100 iterations, A was called 29% of the times, B 35% of the times, and C 35% of the times.
    expectedRatios: [0.29, 0.35, 0.35]
  },

here is how it looks with 3 servers running without any failing requests

  {
    iterations: 100,
    startingWeightPerServer:100,
    errorPenalty: 7,
    config: [{ server: 'A'}, { server: 'B' }, { server: 'C' }],
    expected: ['A','B','C','A','B','C'],
    expectedConnectionRefusedErrors: 1,
    expectedSocketErrors: 0,
    expectedRatios: [0.34, 0.33, 0.33]
  },

The load is distributed equally across the 3 servers

Jul 06 '22 14:07 jodevsa

I'm not sure wether we should have the errorPenalty and startingWeightPerServer as a configured value by the user or should we have a static one defined by us

Jul 06 '22 14:07 jodevsa

It's a good idea to make it configurable with a good default

Jul 06 '22 15:07 mcollina

@mcollina do we have an estimate about how many upstreams would probably be added? like a max of 100?

I'm asking because currently the algorithm suffers from a case where the complexity could be up to O(maxWeight) if the GCD between all the weights is 1. We could optimise that to be O(N) by either calculating the next lower weight from currentWeight or keep it the logic the same and make sure we don't end up with weights that would end up with a GCD of 1

I've also noticed that the elastic-transport-js library has a bug due to this assumption https://github.com/elastic/elastic-transport-js/blob/main/src/pool/WeightedConnectionPool.ts#L54

    // we should be able to find the next node in 1 array scan,
    // if we don't, it means that we are in an infinite loop

This is wrong especially when the GCD is 1. This is why they use a while(true) in the algorithm mentioned in the paper. The worst case is that we need to loop maxWeight times to find the next lower weight

Jul 06 '22 17:07 jodevsa

I would not expect more than 100 peers.

Jul 06 '22 21:07 mcollina

I'm seeing a weird behaviour where I have 3 upstreams (A,B,C) and only 1 of them goes down C. After C is down upstreams A, B will also emit a disconnect event .on('disconnect') with a disconnect socket idle timeout. Does this ring a bell @mcollina? if not I'l try to prepare a test where I can reproduce the problem.

Jul 07 '22 21:07 jodevsa

Looks like a bug to me.

Jul 07 '22 22:07 mcollina

my bad it seems related to the test server keepalive timeout configuration :(

Jul 07 '22 22:07 jodevsa

HI @mcollina, can we close this issue now?

Oct 14 '22 22:10 jodevsa

yes

Oct 15 '22 07:10 mcollina

undici undici copied to clipboard

Improve BalancedPool balancing algorithm

undici
undici copied to clipboard