sozu icon indicating copy to clipboard operation
sozu copied to clipboard

add more load balancing strategies

Open Geal opened this issue 6 years ago • 10 comments

right now we only support randomization:

https://github.com/sozu-proxy/sozu/blob/master/lib/src/network/backends.rs#L183-L193

Geal avatar Mar 16 '18 14:03 Geal

Do you have any idea of which strategies you want to add and how that will effect the configuration file ?

NotBad4U avatar Mar 21 '18 13:03 NotBad4U

not really, I was just thinking again about https://www.youtube.com/watch?v=6NdxUY1La2I and how it could apply. I have not thought in depth about which algorithms should be added and how we would configure them

Geal avatar Mar 21 '18 14:03 Geal

A first step could be to take a look at the list of algorithms proposed by haproxy.

NotBad4U avatar Mar 21 '18 14:03 NotBad4U

Please list them here with short descriptions

On Wed, Mar 21, 2018, 15:25 Alessio Coltellacci [email protected] wrote:

A first step could be to take a look at the list of algorithms proposed by haproxy https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#4-balance.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sozu-proxy/sozu/issues/409#issuecomment-374955367, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHSAJzcG6Zswnp_BexjAk9q2kJbvib6ks5tgmLugaJpZM4St5xH .

Geal avatar Mar 21 '18 14:03 Geal

Haproxy Load balancing policies

roundrobin

Each server is used in turns, according to their weights. This algorithm is dynamic, which means that server weights may be adjusted on the fly for slow starts for instance.

static-rr

Same as roundrobin except that you can't change a server weight on the fly.

leastconn

The server with the lowest number of connections receives the connection.

first

The first server with available connection slots receives the connection. NOTE: Seem's to be not possible to currently add this one in sozu because we don't manage a connection pool.

source (IP hash)

The source IP address is hashed and divided by the total weight of the running servers to designate which server will receive the request.

URI

This algorithm hashes either the left part of the URI (before the question mark) or the whole URI (if the "whole" parameter is present) and divides the hash value by the total weight of the running servers.

hdr (I don't understand this one)

The HTTP header will be looked up in each HTTP request.

rdp-cookie

The RDP cookie will be looked up and hashed for each incoming HTTP request.

Envoy Load balancing policies

roundrobin

same as Haproxy

Weighted least request

The least request load balancer uses an O(1) algorithm which selects two random healthy hosts and picks the host which has fewer active requests.

Ring hash

The ring/modulo hash load balancer implements consistent hashing to upstream hosts. The algorithm is based on mapping all hosts onto a circle such that the addition or removal of a host from the host set changes only affect 1/N requests. This technique is also commonly known as ketama hashing.

Maglev

The Maglev load balancer implements consistent hashing to upstream hosts. From this paper The idea is to generate a large lookup table with each backend taking a number of entries in the table. Envoy implement this one with a fixed table size of 65537.

Random

with already have this one.

NotBad4U avatar Apr 14 '18 15:04 NotBad4U

I'll take this one.

To begin with, I'll implement the load balacing algorithms:

  • round robin
  • static round robin
  • leastconns

I started by watching how Haproxy implement them.

Data structure

In their LB parameters for all algorithms: struct lbprm nested in the struct proxy They use the data structure Elastic Binary Trees - ebtree for the both algorithm (src/lb_fwlc.c, src/lb_fwrr.c). This ebtree is balance by the weight of the server. They just iterate on it to get the next server.

ebtree purpose

In short, the main difference between a regular binary tree and ebtree s that in a regular binary tree, you need to allocate intermediary nodes to attach leaves, and in some environments, having to allocate a node in the middle just to insert a leaf is not convenient. With ebtrees, each structure is both a node and a leaf, and thanks to some pointer manipulation, both of them can be used separately. And this possibility comes with a number of interesting properties described in the article above such as O(1) removal, support for duplicate keys, etc...

The benefit of using ebtrees in haproxy compared to rbtrees is the O(1) removal which makes ebtrees much faster than rbtrees for the scheduler where entries are constantly added/removed. And compared to BST (which was the original design leading to ebtrees), insertion is very fast (no malloc) and remoal doesn't require a free().

A new version is under development to save space. It will have the same complexity as rbtrees but with smaller memory usage. This will be useful to store lots of data which are often looked up and rarely removed (eg: haproxy's stick tables, caches, ...).

More infos: wtarreau.blogspot/elastic-binary-trees-ebtree

Weight configuration

Each server starts with weight of 100 (The range is 0 - 256). You can specify the weight in the configuration file like that:

server web1 10.10.10.10 weight 100
server web2 10.10.10.11 weight 50

How dynamic weight work

An health check run every <T mins and the agent will return up plus weight value or down upon heath check. The weight percentage returned is calculated based on the system load in the last T min.

NOTE: The method server_recalc_eweight() update periodicaly the weight of the servers.

Queue servers in the ebtree

The difference between the usage of the ebtree between Round Robin and Least conn is mostly how they queue their backends.

The calc made by Round Robin:

/* Queue a server in the weight tree <root>, assuming the weight is >0.
 * We want to sort them by inverted weights, because we need to place
 * heavy servers first in order to get a smooth distribution.
 */
static inline void fwrr_queue_by_weight(struct eb_root *root, struct server *s)
{
    s->lb_node.key = SRV_EWGHT_MAX - s->eweight;
    eb32_insert(root, &s->lb_node);
    s->lb_tree = root;
}

NOTE: eb32_insert() method from src/ebtree.c

The calc made by Least conns:

/* Queue a server in its associated tree, assuming the weight is >0.
 * Servers are sorted by #conns/weight. To ensure maximum accuracy,
 * we use #conns * SRV_EWGHT_MAX / eweight as the sorting key.
 */
static inline void fwlc_queue_srv(struct server *s)
{
    s->lb_node.key = s->served * SRV_EWGHT_MAX / s->eweight;
    eb32_insert(s->lb_tree, &s->lb_node);
}

Where: conns = nb of active sessions currently being served SRV_EWGHT_MAX == (SRV_UWGHT_MAX * BE_WEIGHT_SCALE) == 256 * 16 (from define value)

And

/* The scale factor between user weight and effective weight allows smooth
 * weight modulation even with small weights (eg: 1). It should not be too high
 * though because it limits the number of servers in FWRR mode in order to
 * prevent any integer overflow. The max number of servers per backend is
 * limited to about (2^32-1)/256^2/scale ~= 65535.9999/scale. A scale of 16
 * looks like a good value, as it allows 4095 servers per backend while leaving
 * modulation steps of about 6% for servers with the lowest weight (1).
 */
#define BE_WEIGHT_SCALE 16

I don't understand the purpose of BE_WEIGHT_SCALE despite the text above.

Backup server

Haproxy have a interesting feature: backup

When "backup" is present on a server configuration line, the server is only used in load balancing when all other non-backup servers are unavailable.

Haproxy queue active and backup servers in two distinct Elastic Binary Trees.

How to configure

server web1 10.10.10.10 weight 100 backup
server web2 10.10.10.11 weight 50

Proposals (draft)

  • We don't have a crates for ebtree in Rust. But instead of creating a new one, I propose to start with a basic regular binary tree or rbtrees.

  • Enable to modify the data structure which represent the list of backends in BackendList according to the algorithm use:

#[derive(Debug)]
pub struct BackendList {
  pub backends:  Vec<Rc<RefCell<Backend>>>,
            //^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  pub next_id:   u32,
}
  • Add a reference to struct METRICS in BackendMap.

  • Change how we declare the list backends in an application to enable to declare weight.

NotBad4U avatar Apr 24 '18 15:04 NotBad4U

We don't have a crates for ebtree in Rust. But instead of creating a new one, I propose to start with a basic regular binary tree or rbtrees.

The load balancing strategy is defined per application, so the list of backends probably won't be very large, and the list of backends for an application does not change very often. Be careful with fancy algorithm choices because for that number of nodes, some naive algorithm could get the job done.

Add a reference to struct METRICS in BackendMap.

why? It's a global variable, it's available from anywhere. What do you need it for?

Change how we declare the list backends in an application to enable to declare weight

yes, we need that, and a field in applications to define the load balancing strategy, and a field in the configuration to define the default load balancing strategy.

The difference for round robin VS static round robin does not make much sense for sozu right now: the weights could be updated by sending a configuration message for the backend, and we don't implement health checks yet.

Implementing round robin and leastconn should be good enough for now.

Geal avatar Apr 30 '18 13:04 Geal

The load balancing strategy is defined per application, so the list of backends probably won't be very large, and the list of backends for an application does not change very often.

Okai so a simple Vec sorted will be enough.

why? It's a global variable, it's available from anywhere. What do you need it for?

My bad srry I forgot it

NotBad4U avatar Apr 30 '18 13:04 NotBad4U

I'm not sure how it's called in haproxy but they also have a backup system: https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#5.2-backup

I don't know if this falls into a load balancing strategy or if this is a different type of implementation

BlackYoup avatar Jul 09 '18 22:07 BlackYoup

We should take a look at the load balancing features made by Linkerd: https://linkerd.io/1/features/load-balancing/

NotBad4U avatar Nov 14 '18 15:11 NotBad4U