varnish-cache
varnish-cache copied to clipboard
Add .reset_stickiness() to fallback director
This trivial PR adds .reset_stickiness() method to fallback directors. It simply allows reseting director's internal stickiness cursor from VCL in order to start using any available higher-priority backend.
Motivation for this PR is a scenario where multiple sticky fallback directors are being used and ability to reset stickiness only in a specific fallback director is needed. Reloading VCL would be a no-go because that would reset stickiness for all fallback directors; even all backends in all directors are healthy again, extra control is needed to guarantee stickiness is reset in a granular a controlled way. Similarly, using varnishadm backend.set_health on backends handled by the fallback director to be reseted would also be a no-go because backends could be shared among multiple directors.
I hope you find it useful! :)
this looks sensible, but out of curiosity, can you explain the use case behind it? In which case will you want to do that?
Of course. This is coming from a live & VoD video streaming environment. Video sources are grouped in clusters composed by two servers: one server is the master for live streams (although in a degraded scenario -i.e. the other server in the cluster goes sick- it can also serve VoD streams) and the other one is the master for VoD streams (although in a degraded scenario it can also serve live streams). This is how VCL looks like:
backend b1 { }
backend b2 { }
...
backend b10 { }
sub vcl_init {
# Channels 1 to 100.
new live1 = directors.fallback(sticky=true);
live1.add_backend(b1);
live1.add_backend(b2);
new vod1 = directors.fallback(sticky=true);
vod1.add_backend(b2);
vod1.add_backend(b1);
# Channels 101 to 200.
new live2 = directors.fallback(sticky=true);
live2.add_backend(b3);
live2.add_backend(b4);
new vod2 = directors.fallback(sticky=true);
vod2.add_backend(b4);
vod2.add_backend(b3);
...
}
Cluster 1 (i.e. live1 and vod1 directors) is in charge of some channels, cluster 2 is in charge of some other channels, etc. Channel identifier and type of traffic (i.e. live or VoD) is always available in URLs, therefore mapping URLs to directors is trivial. Finally, stickiness is needed because if for example b1 goes sick and b2 starts serving both VoD and live streams, you don't want b1 to resume live streaming as soon as is healthy. Due to synchronization between both servers in the cluster, you need to to the switch back to b1 in a controlled way.
Now suppose live1, vod1 and live2 are all of them degraded (i.e. using the secondary backend). What is needed is a way of moving traffic back to the primary backend for each director in a granular way. Reloading VCL or playing with varnishadm backend.set_health won't help here. That's why .reset_stickiness() + some ad-hoc VCL is neeed.
thanks! that makes a lot of sense
This has been discussed during bugwash. We understand the need for the feature, but we think that it should be implemented as a CLI command rather than in VCL. @carlosabalde, would something like
varnishadm tell live1 reset_fallback
work for you?
Yes, it works for the use case and it's a much better alternative.
I am going to take this ticket hostage for planning the tell interface.
notes from pow-wow discussion:
- Only warm vcls can be told
- vmod objects can have ears
- can vmods define global ears? (namespace?)
- Interface string ear_f (string) or error code?
- wildcards on vcls ?
- wildcards on object names ?
Currently I am at:
vcl.tell *.objname* anything
vcl_foo.objnameXYZ: yes, I will obey
vcl_foo.objnameABC: No way