varnish-cache
varnish-cache copied to clipboard
Assert error in mgt_vcl_discard(): Condition((vp->warm)
With #4048 applied, the test case from https://github.com/nigoroll/libvmod-dynamic/issues/110 can easily trigger this assertion failure when a lookup thread takes longer than cli_timeout to finish (I first saw this by accident while träwelling on an ICE train):
*** v1 debug|Error: Child (501422) not responding to CLI, killed it.
*** v1 debug|Assert error in mgt_vcl_discard(), mgt/mgt_vcl.c line 701:
*** v1 debug| Condition((vp->warm) == 0) not true.
This happens when the mgt_vcl_askchild() times out, because the worker process is processing the cold event for longer.
This hack avoids the issue:
diff --git a/bin/varnishd/mgt/mgt_vcl.c b/bin/varnishd/mgt/mgt_vcl.c
index 9c4e01e9c..e20ebf0c0 100644
--- a/bin/varnishd/mgt/mgt_vcl.c
+++ b/bin/varnishd/mgt/mgt_vcl.c
@@ -696,6 +696,10 @@ mgt_vcl_discard(struct cli *cli, struct vclprog *vp)
vp->warm = 0;
} else {
(void)mgt_vcl_setstate(cli, vp, VCL_STATE_COLD);
+ while (vp->warm) {
+ usleep(1000*1000);
+ (void)mgt_vcl_setstate(cli, vp, VCL_STATE_COLD);
+ }
}
if (MCH_Running()) {
AZ(vp->warm);
How should we fix this issue? Some ideas:
- disable the timeout for a COLD event issued from
mgt_vcl_discard() - re-schedule the discard for later like in auto mode.
bugwash: just dial up cli_timeout to some infinity-ish value. good point though: phk: slink, we may want to use a smaller timeout still for the MGT's ping/pong's...