squid
squid copied to clipboard
Bug 4947: EventScheduler loops on same task forever
The event scheduler itself does not update the time variables, which in some conditions may cause it to loop over the same task forever, and the task itself may never complete and simply abort and reschedule itself if it also does not update the time variables and is waiting for a certain timeout.
The fix adds a call to "getCurrentTime()" when the event scheduler checks events in the queue to avoid this condition.
Can one of the admins verify this patch?
Ultimately we worked around the problem be changing the config and just avoiding setting dns_timeout to 0, but my concern was if there might be other cases where this kind of looping would occur and if it would be possible to prevent it in the EventScheduler code.
If it is the responsibility of the zero-delay jobs to handle this correctly then it seems reasonable that it should be fixed on a case-by-case basis, and the fix should just apply to the idnsCheckQueue code. I don't see any reason why this particular task would ever need to be zero-delay, so my first thought was to impose a minimum delay and enforce it either when validating the configuration (do not accept 0 seconds), and/or in idnsTickleQueue when the task is scheduled.
my concern was if there might be other cases where this kind of looping would occur
Your concern is valid, and your work on addressing that concern (to the extent possible/reasonable) is appreciated. I hope that nothing in my previous response implied otherwise.
If it is the responsibility of the zero-delay jobs to handle this correctly
I believe those callers/jobs is the only location where this problem can be prevented. Event queue modifications, like the workaround you have proposed, cannot prevent some infinite event loops. Ideally, it would be nice to detect (and complain about/break) such infinite event loops in the event queue code, but doing so is probably very difficult, and the actual fix of the detected/salvaged problem would still belong to the eventAdd() caller!
then it seems reasonable that it should be fixed on a case-by-case basis, and the fix should just apply to the idnsCheckQueue code. I don't see any reason why this particular task would ever need to be zero-delay, so my first thought was to impose a minimum delay and enforce it either when validating the configuration (do not accept 0 seconds), and/or in idnsTickleQueue when the task is scheduled.
I agree in principle.
In many cases, it is possible to assign some reasonable meaning to zero delays (which would be treated specially in the code, to support that documented meaning). In this particular case, we could consider something like minimal non-blocking delay roughly equivalent to one main loop iteration
[1]. Directive documentation aside, automatically increasing configured zero delays to a minimum non-zero delay[2] is probably the right implementation approach. I would not even warn about this increase -- a directive documentation update would be sufficient.
[1] The zero-delay meaning currently implemented in the broken official code is, essentially, not "one main loop iteration delay" (i.e., "process during the next main loop iteration") but "process during the same loop iteration", which causes the infinite event loop you have discovered.
[2] Consider using std::nextafter(0.0, 1.0)
for that minimum increase/delay.
[2] Consider using
std::nextafter(0.0, 1.0)
for that minimum increase/delay.
Actually, using std::nexttoward(0, 1)
may be slightly better because it avoids spelling out unnecessary details (i.e., the double
type of arguments) and uses simpler integral types. See docs.
Anyone still working on this?