traefik icon indicating copy to clipboard operation
traefik copied to clipboard

Refactor service load balancer to support different strategies

Open ifross89 opened this issue 11 months ago • 10 comments

What does this PR do?

This change refactors the Weighted Round Robin load balancer to support different load balancing strategies.

Motivation

Currently the only load balancing strategy available within traefik is (weighted) round robin. While this is a good strategy in general, it can be problematic if one or more of the backends is getting overloaded.

There are many load balancing strategies, but the "power-of-two-random-choices" (p2c) algorithm has some good properties that make it a good general use algorithm.

Specifically some of the benefits include:

  • reducing the total number of requests to slower
  • constant time backend picking (in the general
  • reduced "herd behaviour" compared to e.g. "least connections" load balancing.

The algorithm works by taking two backends at random, and choosing the backend that has the fewest in-flight requests. In order to do this, we have to track the number of in-flight requests each backend is currently processing.

The aim of this change is to demonstrate that this new load balancing strategy can be added with minimal changes, and reusing a lot of the existing load balancing code by factoring out the explicit strategy into an interface.

In order to do this, the wrr package was removed, and the existing LoadBalancer was moved to the parent directory: the loadbalancer package.

There are many strategies that can be used for load balancing, many of which require "extrinsic" information, such as the CPU load on the backends. This change is not meant to open the door for adding such strategies, but attempts to add an effective load balancing strategy with low cost to the codebase.

This change does not integrate the new strategy into the rest of traefik: there would need to be more tests added, updates to documentation, and perhaps some investigation into performance / optimisations. I am willing to do this work, but didn't want to spend too much time on this if the change is not going to be accepted, so would like a design review.

As for how this would be specified, as discussed in the linked issue, I propose adding a strategy under loadBalancer, e.g.:

services:
    Service01:
      loadBalancer:
        strategy: wrr
        servers:
          - url: foobar
            weight: 42
          - url: foobar
            weight: 42
    Service02:
      loadBalancer:
        strategy: p2c
        servers:
          - url: foobar
          - url: foobar

More

  • [ ] Added/updated tests
  • [ ] Added/updated documentation

Additional Notes

There is initial discussion about this change here:

https://github.com/traefik/traefik/issues/6985

A private PR with more changes starting to integrate this into the traefik can be found here:

https://github.com/ifross89/traefik/pull/2

ifross89 avatar Mar 19 '24 18:03 ifross89

Hi, just wondering if there are any timelines around getting this reviewed? Do let me know if there is anything I can do to help the process. Thanks

ifross89 avatar Apr 03 '24 10:04 ifross89

There seem to be some conflicting lint rules in the pipeline:

$ golangci-lint run
pkg/server/service/loadbalancer/p2c.go:5: File is not `gofumpt`-ed (gofumpt)

pkg/server/service/loadbalancer/strategy_test.go:3: File is not `gofumpt`-ed (gofumpt)
import (
pkg/server/service/loadbalancer/strategy_test.go:6: File is not `gofumpt`-ed (gofumpt)

        "math/rand/v2"

$ golangci-lint run --fix
$ golangci-lint run      
pkg/server/service/loadbalancer/strategy_test.go:4: File is not `gci`-ed with --skip-generated -s standard -s default (gci)
        "math/rand/v2"
pkg/server/service/loadbalancer/strategy_test.go:6: File is not `gci`-ed with --skip-generated -s standard -s default (gci)
        "testing"
pkg/server/service/loadbalancer/p2c.go:4: File is not `gci`-ed with --skip-generated -s standard -s default (gci)
        crand "crypto/rand"

$ golangci-lint run --fix
$ golangci-lint run      
pkg/server/service/loadbalancer/p2c.go:5: File is not `gofumpt`-ed (gofumpt)

pkg/server/service/loadbalancer/strategy_test.go:3: File is not `gofumpt`-ed (gofumpt)
import (
pkg/server/service/loadbalancer/strategy_test.go:6: File is not `gofumpt`-ed (gofumpt)

        "math/rand/v2"

It is probably because:

  • the gci tool does not seem to think the math/rand/v2 library is in the stdlib, probably as it is new
  • the `gci tool does not group standard lib packages that have an alias.

So either there can be changes to the linting in the pipeline or the code can be refactored to not have both rand packages imported in the same file. I can take advice from a maintainer about which is more appropriate, but it shouldn't affect the bulk of the review.

ifross89 avatar Apr 03 '24 14:04 ifross89

the problem will be fixed when the mergeback of v2.11 will happen. FYI, it's a bug inside a linter, it's already fixed inside golangci-lint.

The expected import blocks are:

import (
	"math/rand/v2"
	"strconv"
	"testing"
)
import (
	crand "crypto/rand"
	"math/rand/v2"
)

ldez avatar Apr 03 '24 15:04 ldez

Very interesting PR. This will be a very valuable addition. I hope maintainers will make the design review soon.

jeflefou avatar Apr 15 '24 18:04 jeflefou

Hi, do you have any opinion on this? Happy to close the PR if it isn't a direction you want to go in. Merci

ifross89 avatar Apr 18 '24 18:04 ifross89

Hello @ifross89,

Sorry for not getting back to you sooner, but I'll try to give you feedback on the changes this week. Thanks anyway for this contribution!

rtribotte avatar May 13 '24 07:05 rtribotte

Hello @ifross89,

Sorry for the late answer/review. We finally took the time to (design) review the changes.

I have marked this PR as reviewable, but after a second thought here is some feedback (and I'll mark back the PR as design review state).

We think that the purpose of the introduced strategy interface is a bit defeated as soon as the namedhandler struct needs to hold the inflight req information. We could afford to duplicate logic for a while since we do have not many load-balancer strategies so far and postpone the interface approach for now. It would be better at first to introduce this new strategy as a concrete load balancer handler implementation. Considering this, WDYT? Would you be inclined to iterate on the implementation in that way?

rtribotte avatar May 29 '24 08:05 rtribotte

Hello @ifross89,

We were just wondering if you think you might want to iterate on this at some point?

rtribotte avatar Jun 24 '24 13:06 rtribotte

Hey sorry for the lack of response.

Due to a new job & a new baby I am currently very short of time, and so can't say when I will be able to get round to this.

If someone else wants to take this forward then feel free, otherwise if I get some time, I may be able to do it myself. Unfortunately I do not know when that will be, however.

As for your comment about whether this should be a concrete load balancer or use the strategy interface. I think I agree that this change creates the new strategy abstraction too early, and a concrete balancer is the better choice.

ifross89 avatar Jun 24 '24 14:06 ifross89

Hello @ifross89,

Thanks for your feedback and congrats ;-)

During today's triage session, the maintainers agreed to put it on the roadmap, so we'll rework the PR whenever we have time.

rtribotte avatar Jun 27 '24 12:06 rtribotte