Fix slowTickIfNecessary with infrequently used EWMA
EWMA.tickIfNecessary does an amount of work that is linear to the amount of time that has passed since the last time the EWMA was ticked. For infrequently used EWMA this can lead to pauses observed in the 700-800 millisecond range after a few hundred days.
It's not really necessary to perform every tick as all that is doing is slowly approaching the smallest representable positive number in a double. Instead pick a number close to zero and then bound the number of ticks to allow that to be reachable from the largest value representable by the EWMA. Actually approaching the smallest representable number is still measurably slow and not particularly useful.
It seems like versions prior to 4.2.x are unmaintained so it's not possible to get this fixed in earlier versions?
More context on the issue and how it was found CASSANDRA-19332
@aweisberg Thanks for your contribution!
It seems like versions prior to
4.2.xare unmaintained so it's not possible to get this fixed in earlier versions?
Yes, that's correct. Any version before Dropwizard Metrics 4.2.x is unmaintained.
Which version are you using exactly in Cassandra?
It's a mix, this is a list of active versions with 3.0/3.11 soon to be unsupported. 4.x will probably be supported for 5+ years. Not a big deal as the work around is to read all the Meters periodically.
5.0+ 4.2.19
4.1 3.1.5
4.0 3.1.5
3.11 3.1.5
3.0 3.1.0
@joschi what is the next step? I reworked the fix slightly to reset the EWMAs to the smallest representable positive value instead of performing even the bounded amount of ticks.
It doesn't set it to 0.0 because that was not the existing behavior which is after the EWMA is first used will always at least have rate set to Double.MIN_NORMAL. Probably not important, but who knows what downstream stuff might not want that to change.
@joschi would you mind to take a look at this please? Cassandra project would be very happy if this was included :)