traefik
traefik copied to clipboard
Refactor service load balancer to support different strategies
What does this PR do?
This change refactors the Weighted Round Robin load balancer to support different load balancing strategies.
Motivation
Currently the only load balancing strategy available within traefik is (weighted) round robin. While this is a good strategy in general, it can be problematic if one or more of the backends is getting overloaded.
There are many load balancing strategies, but the "power-of-two-random-choices" (p2c) algorithm has some good properties that make it a good general use algorithm.
Specifically some of the benefits include:
- reducing the total number of requests to slower
- constant time backend picking (in the general
- reduced "herd behaviour" compared to e.g. "least connections" load balancing.
The algorithm works by taking two backends at random, and choosing the backend that has the fewest in-flight requests. In order to do this, we have to track the number of in-flight requests each backend is currently processing.
The aim of this change is to demonstrate that this new load balancing strategy can be added with minimal changes, and reusing a lot of the existing load balancing code by factoring out the explicit strategy into an interface.
In order to do this, the wrr
package was removed, and the existing LoadBalancer
was moved to the parent directory: the loadbalancer
package.
There are many strategies that can be used for load balancing, many of which require "extrinsic" information, such as the CPU load on the backends. This change is not meant to open the door for adding such strategies, but attempts to add an effective load balancing strategy with low cost to the codebase.
This change does not integrate the new strategy into the rest of traefik: there would need to be more tests added, updates to documentation, and perhaps some investigation into performance / optimisations. I am willing to do this work, but didn't want to spend too much time on this if the change is not going to be accepted, so would like a design review.
As for how this would be specified, as discussed in the linked issue, I propose adding a strategy
under loadBalancer
, e.g.:
services:
Service01:
loadBalancer:
strategy: wrr
servers:
- url: foobar
weight: 42
- url: foobar
weight: 42
Service02:
loadBalancer:
strategy: p2c
servers:
- url: foobar
- url: foobar
More
- [ ] Added/updated tests
- [ ] Added/updated documentation
Additional Notes
There is initial discussion about this change here:
https://github.com/traefik/traefik/issues/6985
A private PR with more changes starting to integrate this into the traefik can be found here:
https://github.com/ifross89/traefik/pull/2
Hi, just wondering if there are any timelines around getting this reviewed? Do let me know if there is anything I can do to help the process. Thanks
There seem to be some conflicting lint rules in the pipeline:
$ golangci-lint run
pkg/server/service/loadbalancer/p2c.go:5: File is not `gofumpt`-ed (gofumpt)
pkg/server/service/loadbalancer/strategy_test.go:3: File is not `gofumpt`-ed (gofumpt)
import (
pkg/server/service/loadbalancer/strategy_test.go:6: File is not `gofumpt`-ed (gofumpt)
"math/rand/v2"
$ golangci-lint run --fix
$ golangci-lint run
pkg/server/service/loadbalancer/strategy_test.go:4: File is not `gci`-ed with --skip-generated -s standard -s default (gci)
"math/rand/v2"
pkg/server/service/loadbalancer/strategy_test.go:6: File is not `gci`-ed with --skip-generated -s standard -s default (gci)
"testing"
pkg/server/service/loadbalancer/p2c.go:4: File is not `gci`-ed with --skip-generated -s standard -s default (gci)
crand "crypto/rand"
$ golangci-lint run --fix
$ golangci-lint run
pkg/server/service/loadbalancer/p2c.go:5: File is not `gofumpt`-ed (gofumpt)
pkg/server/service/loadbalancer/strategy_test.go:3: File is not `gofumpt`-ed (gofumpt)
import (
pkg/server/service/loadbalancer/strategy_test.go:6: File is not `gofumpt`-ed (gofumpt)
"math/rand/v2"
It is probably because:
- the
gci
tool does not seem to think themath/rand/v2
library is in the stdlib, probably as it is new - the `gci tool does not group standard lib packages that have an alias.
So either there can be changes to the linting in the pipeline or the code can be refactored to not have both rand
packages imported in the same file. I can take advice from a maintainer about which is more appropriate, but it shouldn't affect the bulk of the review.
the problem will be fixed when the mergeback of v2.11 will happen. FYI, it's a bug inside a linter, it's already fixed inside golangci-lint.
The expected import blocks are:
import (
"math/rand/v2"
"strconv"
"testing"
)
import (
crand "crypto/rand"
"math/rand/v2"
)
Very interesting PR. This will be a very valuable addition. I hope maintainers will make the design review soon.
Hi, do you have any opinion on this? Happy to close the PR if it isn't a direction you want to go in. Merci
Hello @ifross89,
Sorry for not getting back to you sooner, but I'll try to give you feedback on the changes this week. Thanks anyway for this contribution!
Hello @ifross89,
Sorry for the late answer/review. We finally took the time to (design) review the changes.
I have marked this PR as reviewable, but after a second thought here is some feedback (and I'll mark back the PR as design review state).
We think that the purpose of the introduced strategy
interface is a bit defeated as soon as the namedhandler
struct needs to hold the inflight req information.
We could afford to duplicate logic for a while since we do have not many load-balancer strategies so far and postpone the interface approach for now.
It would be better at first to introduce this new strategy as a concrete load balancer handler implementation.
Considering this, WDYT? Would you be inclined to iterate on the implementation in that way?
Hello @ifross89,
We were just wondering if you think you might want to iterate on this at some point?
Hey sorry for the lack of response.
Due to a new job & a new baby I am currently very short of time, and so can't say when I will be able to get round to this.
If someone else wants to take this forward then feel free, otherwise if I get some time, I may be able to do it myself. Unfortunately I do not know when that will be, however.
As for your comment about whether this should be a concrete load balancer or use the strategy interface. I think I agree that this change creates the new strategy
abstraction too early, and a concrete balancer is the better choice.
Hello @ifross89,
Thanks for your feedback and congrats ;-)
During today's triage session, the maintainers agreed to put it on the roadmap, so we'll rework the PR whenever we have time.