metrics icon indicating copy to clipboard operation
metrics copied to clipboard

Fix slowTickIfNecessary with infrequently used EWMA

Open aweisberg opened this issue 1 year ago • 5 comments

EWMA.tickIfNecessary does an amount of work that is linear to the amount of time that has passed since the last time the EWMA was ticked. For infrequently used EWMA this can lead to pauses observed in the 700-800 millisecond range after a few hundred days.

It's not really necessary to perform every tick as all that is doing is slowly approaching the smallest representable positive number in a double. Instead pick a number close to zero and then bound the number of ticks to allow that to be reachable from the largest value representable by the EWMA. Actually approaching the smallest representable number is still measurably slow and not particularly useful.

aweisberg avatar Jan 30 '24 22:01 aweisberg

It seems like versions prior to 4.2.x are unmaintained so it's not possible to get this fixed in earlier versions?

More context on the issue and how it was found CASSANDRA-19332

aweisberg avatar Jan 30 '24 22:01 aweisberg

@aweisberg Thanks for your contribution!

It seems like versions prior to 4.2.x are unmaintained so it's not possible to get this fixed in earlier versions?

Yes, that's correct. Any version before Dropwizard Metrics 4.2.x is unmaintained.

Which version are you using exactly in Cassandra?

joschi avatar Jan 31 '24 12:01 joschi

It's a mix, this is a list of active versions with 3.0/3.11 soon to be unsupported. 4.x will probably be supported for 5+ years. Not a big deal as the work around is to read all the Meters periodically. 5.0+ 4.2.19 4.1 3.1.5 4.0 3.1.5 3.11 3.1.5 3.0 3.1.0

aweisberg avatar Jan 31 '24 15:01 aweisberg

@joschi what is the next step? I reworked the fix slightly to reset the EWMAs to the smallest representable positive value instead of performing even the bounded amount of ticks.

It doesn't set it to 0.0 because that was not the existing behavior which is after the EWMA is first used will always at least have rate set to Double.MIN_NORMAL. Probably not important, but who knows what downstream stuff might not want that to change.

aweisberg avatar Feb 06 '24 18:02 aweisberg

@joschi would you mind to take a look at this please? Cassandra project would be very happy if this was included :)

smiklosovic avatar Mar 25 '24 15:03 smiklosovic