libvmod-dynamic icon indicating copy to clipboard operation
libvmod-dynamic copied to clipboard

Add least connections and weighted least connections balancing algorithms

Open karlvr opened this issue 4 years ago • 1 comments

There is some code to tidy up in here (a x 1.25 particularly), but I am hoping to gauge your interest in merging something like this! I made these changes several years ago, and we've been running with these changes for all of that time. Although I did just port them to 6.6.

Let me know and I'll work on making it beautiful with you!

karlvr avatar Jun 03 '21 07:06 karlvr

So far, I only had a quick look: In general, I am open to the ideas. First comments:

  • For SRV record resolution via .service(), we already have an implementation of priorities and weights, see service_resolve() for how it works. If we want to add weighting to resolution of A-Records with .backend(), we should integrate the two.
  • Likewise, if we add (weighted) least connections, we should also add the option for SRV records.
  • In general, while I understand that least connections appears appealing, it is fundamentally racy because right now there exists no (clean, API) way to reserve a connection. I am hesitating a bit to add a half baked implementation and would prefer to lay the foundation for a correct implementation in varnish-cache.
  • Regarding slow_start_max_connections, can you elaborate please if/why you think this is the best option? In the shard director, we use a rampup time. While I agree that the time since a backend went healthy does not necessarily correlate with "how much it got warmed up already", in my experience it does in practice. Also, a time period has the advantage of being independent of the number of varnishes (or other servers) using a backend. If we went for the metric you propose, we should probably rename it to make clear that the ramup weight is proportional to the number of requests (connections are usually persisted via varnish-cache connection pooling).

On a practical side, this smells like a bit of work and I would need to check with current sponsors if they are interested in supporting integration work. This roadblock could of course be lifted by new sponsors ;)

nigoroll avatar Jun 03 '21 08:06 nigoroll

No sponsor turned up in 1.5 years

nigoroll avatar Nov 08 '22 19:11 nigoroll

@nigoroll We've continued to find good results with the least connections and weighted least. Especially with our volume of traffic and pauses in backend applications, we often find that round robin ends up piling on connections to a server that is slower or momentarily paused. The heuristic nature (rather than guarantee) of the least connections metric works fine, as it is just a heuristic to improve the performance.

I have eliminated the slow start mechanism. We didn't require it and the implementation was a little odd.

I haven't used the SRV records stuff before but I can no doubt work out how to integrate least connections into it and work out if there's a nice way to use a similar implementation.

How are you feeling about this generally. Is it worth me putting in a bit more effort in SRV so we could merge it? Or do you think it is dependent on changes in varnish-cache and I should just keep on porting my changes forward each new varnish-cache version! :-)

karlvr avatar Jan 05 '23 04:01 karlvr