micrometer
micrometer copied to clipboard
Percentiles and Steps are using Different Time Windows
Describe the bug
Percentiles and Steps are using Different Time Windows ( Different starting end ending points in time ). This causes scenarios such as :
- HistogramSnapshot from a StepTimer will return a non-zero 95th percentile while returning a zero mean and zero count.
- An increment of a StepCounter will only get reflected at the next step interval. ( e.g. A StepCounter with a step of 1-minute where an increment occured at 00:10:02 will not get reflected in StepCounter.count() until after 00:11:00 ).
Environment
- Micrometer version: 1.8.3
- Micrometer registry: StepMeterRegistry ( with a 1-minute step )
- OS: Linux Ubuntu 20.04.4 LTS 5.13.0-35-generic
- Java version: 1.8.0_322 - OpenJDK 64-Bit Server VM (Temurin)(build 25.322-b06, mixed mode)
To Reproduce How to reproduce the bug:
Attached ZIP file contains: metrics.zip
Archive: metrics.zip
Length Date Time Name
--------- ---------- ----- ----
1606 2022-03-20 17:44 pom.xml
4070 2022-03-20 18:00 src/main/java/org/example/metrics/MetricsApplication.java
--------- -------
5676 2 files
That :
- Sets up a StepMeterRegistry with a 1-minute step
- Increments a StepCounter only once at the beginning, nothing more.
- Add 10 dummy recordings to a StepTimer at the beginning, nothing more.
- Log values from the StepCounter and StepTimer every 10 seconds
Depending on the time the application is executed, you may get slightly different results, but the idea is there.
Sample output ( with line breaks added between output for emphasis ):
2022-03-20 16:53:36.013 INFO 110081 --- [ main] org.example.metrics.MetricsApplication : Started MetricsApplication in 1.324 seconds (JVM running for 1.51)
2022-03-20 16:53:36.018 INFO 110081 --- [ main] org.example.metrics.MetricsApplication : Incrementing counter...
2022-03-20 16:53:36.018 INFO 110081 --- [ main] org.example.metrics.MetricsApplication : Counter incremented.
2022-03-20 16:53:36.018 INFO 110081 --- [ main] org.example.metrics.MetricsApplication : Adding dummy timer recordings ...
2022-03-20 16:53:36.022 INFO 110081 --- [ main] org.example.metrics.MetricsApplication : Added dummy timer recordings.
2022-03-20 16:53:36.022 INFO 110081 --- [ main] org.example.metrics.MetricsApplication : Application ready.
2022-03-20 16:53:37.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:53:37.015 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 10166.992896
2022-03-20 16:53:47.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:53:47.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 10166.992896
2022-03-20 16:53:57.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:53:57.012 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 10166.992896
2022-03-20 16:54:07.012 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 1.0
2022-03-20 16:54:07.012 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 10, Histogram Count: 10, Mean: 5500.0, Histogram Mean: 5500.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 10166.992896
2022-03-20 16:54:17.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 1.0
2022-03-20 16:54:17.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 10, Histogram Count: 10, Mean: 5500.0, Histogram Mean: 5500.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 10166.992896
2022-03-20 16:54:27.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 1.0
2022-03-20 16:54:27.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 10, Histogram Count: 10, Mean: 5500.0, Histogram Mean: 5500.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 10166.992896
2022-03-20 16:54:37.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 1.0
2022-03-20 16:54:37.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 10, Histogram Count: 10, Mean: 5500.0, Histogram Mean: 5500.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:54:47.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 1.0
2022-03-20 16:54:47.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 10, Histogram Count: 10, Mean: 5500.0, Histogram Mean: 5500.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:54:57.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 1.0
2022-03-20 16:54:57.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 10, Histogram Count: 10, Mean: 5500.0, Histogram Mean: 5500.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:55:07.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:55:07.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:55:17.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:55:17.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:55:27.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:55:27.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:55:37.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:55:37.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:55:47.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:55:47.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:55:57.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:55:57.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:56:07.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:56:07.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:56:17.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:56:17.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:56:27.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:56:27.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 10000.0, Histogram Max: 10000.0, Percentile0.95: 0.0
2022-03-20 16:56:37.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:56:37.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 0.0, Histogram Max: 0.0, Percentile0.95: 0.0
2022-03-20 16:56:47.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:56:47.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 0.0, Histogram Max: 0.0, Percentile0.95: 0.0
2022-03-20 16:56:57.010 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Counter step.counter: 0.0
2022-03-20 16:56:57.011 INFO 110081 --- [ scheduling-1] org.example.metrics.MetricsApplication : Timer step.timer: Count: 0, Histogram Count: 0, Mean: 0.0, Histogram Mean: 0.0, Max: 0.0, Histogram Max: 0.0, Percentile0.95: 0.0
2022-03-20 16:57:01.652 INFO 110081 --- [ionShutdownHook] o.s.b.w.e.tomcat.GracefulShutdown : Commencing graceful shutdown. Waiting for active requests to complete
2022-03-20 16:57:01.655 INFO 110081 --- [tomcat-shutdown] o.s.b.w.e.tomcat.GracefulShutdown : Graceful shutdown complete
From the above example, you can see from the output that :
- You have a non-zero 95th percentile but a zero StepCounter.count(), StepTimer.count(), StepTimer.mean(), and HistogramSnapshot().mean return 0 from 16:53:36 up to < 16:54:00 ( which is where I guess the step "interval" starts ).
- It does not make sense to show a 95th percentile while you have a count and max of zero.
You basically have the following "time-windows" ( in ASCII-format ) :
Pre: Counter Incremented, 10 Dummy Recordings added
16:53:36 ------\----\ <----- Percentile Window Start --------- 95th percentile reported OK at this point, but StepCounter.count(), StepTimer.count(), StepTimer.mean(), and HistogramSnapshot().mean return 0
16:53:46 | | Also Histogram Window Start ----- StepTimer.max() and HistogramSnapshot.max() return non-zero value.
16:53:56 | |
16:54:06 ----\ | | <----- Counter Step Start ------------ Both StepCounter.count() and StepTimer.count() only start returning 1 at this point.
16:54:16 | | | Also, StepTimer.mean() and HistogramSnapshot.mean() only start returning non-zero at this point.
16:54:26 | | |
16:54:36 ------/ | <----- Percentile Window End ----------- 1-minute window for percentile finished. 95th percentile returns 0 afterwards.
16:54:46 | |
16:54:56 | |
16:55:06 ----/ | <----- Counter Step End -------------- Both StepCounter.count() and StepTimer.count() return 0 after this point.
16:55:16 | Also, StepTimer.mean() and HistogramSnapshot.mean() only start returning non-zero at this point.
16:55:26 |
16:55:36 |
16:55:46 |
16:55:56 |
16:56:06 |
16:56:16 |
16:56:26 |
16:56:36 -----------/ <----- Histogram Window End ------------ 3-minute window for Histogram ends. StepTimer.max() and HistogramSnapshot.max() return 0 afterwards.
16:56:46
16:56:56
Expected behavior
- Expected that both percentile and "steps" follow the same window, preferrably that the time-window used by the percentile / snapshot is also used by the StepXXX classes. That is, when the meter is created.
Additional context From what I understand, all the StepXXX classes and the poll() and rollCount() method shows that a "step" does not start when the meter started in time, which is what I thought what it was, but it is a "fixed" point in time.
e.g.
- A 1-minute step is literally a minute in the wall clock with 00 seconds
- A 5-minute step is literally every 5 minute in the wall clock with 00 seconds ( 16:00:00, 16:05:00, 16:10:00, 16:15:00, 16:20:00, 16:25:00, etc ... )
This appears to be different to how the percentiles are "placed" in the time window.
The bigger the step ( e.g. 5 minutes ), the bigger the time where the numbers possibly do not make sense.
If I started the sample app above at 18:40:12 with a 5-minute step, the 95th percentile will be non-zero but the count() and the mean() will be zero up to just before 18:45:00. Only on after 18:45:00 will the count() and mean() be non-zero.