micrometer icon indicating copy to clipboard operation
micrometer copied to clipboard

Percentiles and Steps are using Different Time Windows

Open jmsjr opened this issue 2 years ago • 0 comments

Describe the bug

Percentiles and Steps are using Different Time Windows ( Different starting end ending points in time ). This causes scenarios such as :

  • HistogramSnapshot from a StepTimer will return a non-zero 95th percentile while returning a zero mean and zero count.
  • An increment of a StepCounter will only get reflected at the next step interval. ( e.g. A StepCounter with a step of 1-minute where an increment occured at 00:10:02 will not get reflected in StepCounter.count() until after 00:11:00 ).

Environment

  • Micrometer version: 1.8.3
  • Micrometer registry: StepMeterRegistry ( with a 1-minute step )
  • OS: Linux Ubuntu 20.04.4 LTS 5.13.0-35-generic
  • Java version: 1.8.0_322 - OpenJDK 64-Bit Server VM (Temurin)(build 25.322-b06, mixed mode)

To Reproduce How to reproduce the bug:

Attached ZIP file contains: metrics.zip

Archive:  metrics.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
     1606  2022-03-20 17:44   pom.xml
     4070  2022-03-20 18:00   src/main/java/org/example/metrics/MetricsApplication.java
---------                     -------
     5676                     2 files

That :

  • Sets up a StepMeterRegistry with a 1-minute step
  • Increments a StepCounter only once at the beginning, nothing more.
  • Add 10 dummy recordings to a StepTimer at the beginning, nothing more.
  • Log values from the StepCounter and StepTimer every 10 seconds

Depending on the time the application is executed, you may get slightly different results, but the idea is there.

Sample output ( with line breaks added between output for emphasis ):

2022-03-20 16:53:36.013  INFO 110081 --- [           main] org.example.metrics.MetricsApplication   : Started MetricsApplication in 1.324 seconds (JVM running for 1.51)
2022-03-20 16:53:36.018  INFO 110081 --- [           main] org.example.metrics.MetricsApplication   : Incrementing counter...
2022-03-20 16:53:36.018  INFO 110081 --- [           main] org.example.metrics.MetricsApplication   : Counter incremented.
2022-03-20 16:53:36.018  INFO 110081 --- [           main] org.example.metrics.MetricsApplication   : Adding dummy timer recordings ...
2022-03-20 16:53:36.022  INFO 110081 --- [           main] org.example.metrics.MetricsApplication   : Added dummy timer recordings.
2022-03-20 16:53:36.022  INFO 110081 --- [           main] org.example.metrics.MetricsApplication   : Application ready.


2022-03-20 16:53:37.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:53:37.015  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 10166.992896
2022-03-20 16:53:47.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:53:47.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 10166.992896
2022-03-20 16:53:57.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:53:57.012  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 10166.992896


2022-03-20 16:54:07.012  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 1.0
2022-03-20 16:54:07.012  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 10, Histogram Count: 10, Mean: 5500.0, Histogram Mean: 5500.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 10166.992896
2022-03-20 16:54:17.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 1.0
2022-03-20 16:54:17.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 10, Histogram Count: 10, Mean: 5500.0, Histogram Mean: 5500.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 10166.992896
2022-03-20 16:54:27.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 1.0
2022-03-20 16:54:27.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 10, Histogram Count: 10, Mean: 5500.0, Histogram Mean: 5500.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 10166.992896


2022-03-20 16:54:37.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 1.0
2022-03-20 16:54:37.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 10, Histogram Count: 10, Mean: 5500.0, Histogram Mean: 5500.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:54:47.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 1.0
2022-03-20 16:54:47.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 10, Histogram Count: 10, Mean: 5500.0, Histogram Mean: 5500.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:54:57.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 1.0
2022-03-20 16:54:57.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 10, Histogram Count: 10, Mean: 5500.0, Histogram Mean: 5500.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0


2022-03-20 16:55:07.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:55:07.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:55:17.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:55:17.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:55:27.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:55:27.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:55:37.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:55:37.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:55:47.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:55:47.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:55:57.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:55:57.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:56:07.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:56:07.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:56:17.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:56:17.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:56:27.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:56:27.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0


2022-03-20 16:56:37.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:56:37.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 0.0, Histogram Max: 0.0, Percentile0.95: 0.0
2022-03-20 16:56:47.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:56:47.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 0.0, Histogram Max: 0.0, Percentile0.95: 0.0
2022-03-20 16:56:57.010  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Counter step.counter: 0.0
2022-03-20 16:56:57.011  INFO 110081 --- [   scheduling-1] org.example.metrics.MetricsApplication   : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 0.0, Histogram Max: 0.0, Percentile0.95: 0.0
2022-03-20 16:57:01.652  INFO 110081 --- [ionShutdownHook] o.s.b.w.e.tomcat.GracefulShutdown        : Commencing graceful shutdown. Waiting for active requests to complete
2022-03-20 16:57:01.655  INFO 110081 --- [tomcat-shutdown] o.s.b.w.e.tomcat.GracefulShutdown        : Graceful shutdown complete

From the above example, you can see from the output that :

  • You have a non-zero 95th percentile but a zero StepCounter.count(), StepTimer.count(), StepTimer.mean(), and HistogramSnapshot().mean return 0 from 16:53:36 up to < 16:54:00 ( which is where I guess the step "interval" starts ).
  • It does not make sense to show a 95th percentile while you have a count and max of zero.

You basically have the following "time-windows" ( in ASCII-format ) :

Pre: Counter Incremented, 10 Dummy Recordings added

16:53:36 ------\----\  <----- Percentile Window Start --------- 95th percentile reported OK at this point, but StepCounter.count(), StepTimer.count(), StepTimer.mean(), and HistogramSnapshot().mean return 0 
16:53:46       |    |         Also Histogram Window Start ----- StepTimer.max() and HistogramSnapshot.max() return non-zero value.
16:53:56       |    |         
16:54:06 ----\ |    |  <----- Counter Step Start ------------ Both StepCounter.count() and StepTimer.count() only start returning 1 at this point. 
16:54:16     | |    |                                           Also, StepTimer.mean() and HistogramSnapshot.mean() only start returning non-zero at this point.
16:54:26     | |    |
16:54:36 ------/    |  <----- Percentile Window End ----------- 1-minute window for percentile finished. 95th percentile returns 0 afterwards.
16:54:46     |      | 
16:54:56     |      | 
16:55:06 ----/      |  <----- Counter Step End -------------- Both StepCounter.count() and StepTimer.count() return 0 after this point.
16:55:16            |                                           Also, StepTimer.mean() and HistogramSnapshot.mean() only start returning non-zero at this point.
16:55:26            |
16:55:36            |
16:55:46            |
16:55:56            |
16:56:06            |
16:56:16            |
16:56:26            |
16:56:36 -----------/  <----- Histogram Window End ------------ 3-minute window for Histogram ends. StepTimer.max() and HistogramSnapshot.max() return 0 afterwards.
16:56:46
16:56:56

Expected behavior

  • Expected that both percentile and "steps" follow the same window, preferrably that the time-window used by the percentile / snapshot is also used by the StepXXX classes. That is, when the meter is created.

Additional context From what I understand, all the StepXXX classes and the poll() and rollCount() method shows that a "step" does not start when the meter started in time, which is what I thought what it was, but it is a "fixed" point in time.

e.g.

  • A 1-minute step is literally a minute in the wall clock with 00 seconds
  • A 5-minute step is literally every 5 minute in the wall clock with 00 seconds ( 16:00:00, 16:05:00, 16:10:00, 16:15:00, 16:20:00, 16:25:00, etc ... )

This appears to be different to how the percentiles are "placed" in the time window.

The bigger the step ( e.g. 5 minutes ), the bigger the time where the numbers possibly do not make sense.

If I started the sample app above at 18:40:12 with a 5-minute step, the 95th percentile will be non-zero but the count() and the mean() will be zero up to just before 18:45:00. Only on after 18:45:00 will the count() and mean() be non-zero.

jmsjr avatar Mar 20 '22 07:03 jmsjr