varnish-cache icon indicating copy to clipboard operation
varnish-cache copied to clipboard

directors locking: CLI ops vs. VRT_DelDirector / VRT_AddDirector

Open nigoroll opened this issue 6 years ago • 7 comments

This tickets comes from a very true comment by @bsdphk https://github.com/varnishcache/varnish-cache/issues/3094#issuecomment-548274037

Wasn't there something about CLI wandering the backends getting in the way of traffic when creation of dynamic backends got stuck behind CLI's holding the mutex ?

VCL_IterDirector() (rightly) holds the vcl_mtx while it is generating backend information

https://github.com/varnishcache/varnish-cache/blob/8e7b0f01d7c2bb0ae57c016f8bfb01d1fbd00dcb/bin/varnishd/cache/cache_vcl.c#L319

So CLI commands block all director operations.

For all practical purposes, I do not think this is too much of an issue because CLI writes happen into a VSB. So as long as the director list and healthy callbacks do not conduct any time consuming or potentially blocking operations, the potential impact on traffic should be rather limited.

So the first question is: Do we want to do anything about it? Please provide feedback.

If yes, my preferred option would be to solve this issue through director references (CLI gaining references for the duration of the operation), see #2725 . Other options may include to always put directors on a cool list when deleting, so the CLI operations can walk a copy of the director list safely outside the lock.

I want to keep this issue separate from #3094 - because that ticket concerns another lock which we do not actually need and which does not solve this problem either.

nigoroll avatar Nov 07 '19 07:11 nigoroll

The cool list idea would probably generalize VRT_delete_backend() to directors https://github.com/varnishcache/varnish-cache/blob/8e7b0f01d7c2bb0ae57c016f8bfb01d1fbd00dcb/bin/varnishd/cache/cache_backend.c#L646

nigoroll avatar Nov 07 '19 08:11 nigoroll

I am really not happy about the cool-list, primarily because it is optimistic.

I tried adding director refcounts cheaply some weeks ago, but it became messy so I gave up.

We may have to simply bite that bullet.

bsdphk avatar Nov 07 '19 09:11 bsdphk

I still think that the previous refusal of my director refcount suggestion was based on a wrong assumption, and I still stink that the suggestion I made is cheap. I shall go back to that PR and revive it.

nigoroll avatar Nov 07 '19 11:11 nigoroll

Is this ticket serving any purpose any more ?

bsdphk avatar Dec 02 '19 20:12 bsdphk

Is everybody happy with my suggestion https://github.com/varnishcache/varnish-cache/issues/3119#issuecomment-551034495 ?

nigoroll avatar Dec 03 '19 09:12 nigoroll

I find it hard to work with reference counting since that would force VMODs to opt in and not mess everything up.

I currently see no future-proof way to handle this, and speaking strictly about the CLI holding the VCL lock, this can become really ugly in setups with tons of dynamic backends. Maybe we should consider moving away from a tail queue to something that better handles the many-backends case, but that too is its own challenge because of our glob-based backend iteration.

Maybe we should have a "separate" lock to add backends, and then a CLI "before" action to commit them in the "main" backend list to let VCL make progress when the CLI is busy iterating over it.

dridi avatar Dec 03 '19 10:12 dridi

@Dridi I believe my previous PR addressed all your concerns, except for the fact, that it used one additional lock on the same level as the vcl_mtx. I think that additional lock can in fact be avoided and the operations put into VCL_Ref() / VCL_Rel(). Feel free to have a look at the old PR #2725 but please keep in mind the above.

nigoroll avatar Dec 03 '19 10:12 nigoroll