hbase icon indicating copy to clipboard operation
hbase copied to clipboard

HBASE-27316 Time based metrics will be reset after any get request

Open fan1emon opened this issue 3 years ago • 5 comments

Jmx metrics can be query by http request.

But metrics will be reset after any get request.

The root cause may be the implement of Histogram's method "snapshot"

public Snapshot snapshot() { return histogram.snapshotAndReset(); }

It will call snapshot and reset at the same time.

I think it should not be reset cause we may need history metrics.

fan1emon avatar Aug 23 '22 08:08 fan1emon

:broken_heart: -1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 1m 8s Docker mode activated.
-0 :warning: yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 :green_heart: mvninstall 2m 33s master passed
+1 :green_heart: compile 0m 11s master passed
+1 :green_heart: shadedjars 3m 49s branch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 12s master passed
_ Patch Compile Tests _
+1 :green_heart: mvninstall 2m 19s the patch passed
+1 :green_heart: compile 0m 11s the patch passed
+1 :green_heart: javac 0m 11s the patch passed
+1 :green_heart: shadedjars 3m 45s patch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 10s the patch passed
_ Other Tests _
-1 :x: unit 0m 16s hbase-metrics in the patch failed.
15m 40s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4719/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/4719
Optional Tests javac javadoc unit shadedjars compile
uname Linux 4d3552af81d5 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 950ad8dd3e
Default Java AdoptOpenJDK-1.8.0_282-b08
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4719/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-metrics.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4719/1/testReport/
Max. process+thread count 154 (vs. ulimit of 30000)
modules C: hbase-metrics U: hbase-metrics
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4719/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Aug 23 '22 08:08 Apache-HBase

:broken_heart: -1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 0m 40s Docker mode activated.
-0 :warning: yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 :green_heart: mvninstall 3m 3s master passed
+1 :green_heart: compile 0m 10s master passed
+1 :green_heart: shadedjars 4m 2s branch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 11s master passed
_ Patch Compile Tests _
+1 :green_heart: mvninstall 2m 42s the patch passed
+1 :green_heart: compile 0m 10s the patch passed
+1 :green_heart: javac 0m 10s the patch passed
+1 :green_heart: shadedjars 4m 1s patch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 10s the patch passed
_ Other Tests _
-1 :x: unit 0m 17s hbase-metrics in the patch failed.
16m 26s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4719/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/4719
Optional Tests javac javadoc unit shadedjars compile
uname Linux 2a1a76c592b8 5.4.0-1081-aws #88~18.04.1-Ubuntu SMP Thu Jun 23 16:29:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 950ad8dd3e
Default Java AdoptOpenJDK-11.0.10+9
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4719/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-metrics.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4719/1/testReport/
Max. process+thread count 156 (vs. ulimit of 30000)
modules C: hbase-metrics U: hbase-metrics
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4719/1/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Aug 23 '22 08:08 Apache-HBase

:confetti_ball: +1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 1m 1s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+1 :green_heart: hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 :green_heart: mvninstall 2m 27s master passed
+1 :green_heart: compile 0m 17s master passed
+1 :green_heart: checkstyle 0m 8s master passed
+1 :green_heart: spotless 0m 41s branch has no errors when running spotless:check.
+1 :green_heart: spotbugs 0m 19s master passed
_ Patch Compile Tests _
+1 :green_heart: mvninstall 2m 16s the patch passed
+1 :green_heart: compile 0m 15s the patch passed
+1 :green_heart: javac 0m 15s the patch passed
+1 :green_heart: checkstyle 0m 7s the patch passed
+1 :green_heart: whitespace 0m 0s The patch has no whitespace issues.
+1 :green_heart: hadoopcheck 8m 5s Patch does not cause any errors with Hadoop 3.2.4 3.3.4.
+1 :green_heart: spotless 0m 39s patch has no errors when running spotless:check.
+1 :green_heart: spotbugs 0m 22s the patch passed
_ Other Tests _
+1 :green_heart: asflicense 0m 10s The patch does not generate ASF License warnings.
22m 4s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4719/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/4719
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 4fb85b3c2d41 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 950ad8dd3e
Default Java AdoptOpenJDK-1.8.0_282-b08
Max. process+thread count 69 (vs. ulimit of 30000)
modules C: hbase-metrics U: hbase-metrics
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4719/1/console
versions git=2.17.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Aug 23 '22 08:08 Apache-HBase

I can't figure out why snapshot will be reset after any get request. The count of ops will keeps growing while the quantiles are reset.

I'll fix the test error if this patch needed

fan1emon avatar Aug 24 '22 08:08 fan1emon

First query jmx

      "FlushTime_num_ops": 11142,
      "FlushTime_min": 1201,
      "FlushTime_max": 4614,
      "FlushTime_mean": 2437,
      "FlushTime_25th_percentile": 1344,
      "FlushTime_median": 2383,
      "FlushTime_75th_percentile": 2647,
      "FlushTime_90th_percentile": 4607,
      "FlushTime_95th_percentile": 4610,
      "FlushTime_98th_percentile": 4612,
      "FlushTime_99th_percentile": 4613,
      "FlushTime_99.9th_percentile": 4613,
      "FlushTime_TimeRangeCount_1000-3000": 4,
      "FlushTime_TimeRangeCount_3000-10000": 1,

Next

      "FlushTime_num_ops": 11191,
      "FlushTime_min": 534,
      "FlushTime_max": 3932,
      "FlushTime_mean": 2835,
      "FlushTime_25th_percentile": 2676,
      "FlushTime_median": 3104,
      "FlushTime_75th_percentile": 3531,
      "FlushTime_90th_percentile": 3913,
      "FlushTime_95th_percentile": 3922,
      "FlushTime_98th_percentile": 3928,
      "FlushTime_99th_percentile": 3930,
      "FlushTime_99.9th_percentile": 3931,
      "FlushTime_TimeRangeCount_300-1000": 1,
      "FlushTime_TimeRangeCount_1000-3000": 2,
      "FlushTime_TimeRangeCount_3000-10000": 4,

Some metrics like FlushTime_max , FlushTime_TimeRangeCount_1000-3000 are reset

fan1emon avatar Aug 24 '22 08:08 fan1emon

@bbeaudreault i believe we are interested in this change.

briaugenreich avatar Dec 08 '22 21:12 briaugenreich

Yea, I think there are a couple problems with this approach:

  1. The histogram is initially created with some generic buckets that are used to create the distribution, which is then used to calculate the percentiles. Those generic buckets will get more accurate over time, because when you snapshotAndReset they use the boundaries of the old bins to modify the new bins. Take a look at the Bins constructor, which is called in the snapshotAndReset method. I would expect that if we never do snapshotAndReset, we'll have less accurate percentiles especially for outliers. This seems problematic given the typical use-case is for looking at 99th and 99.9th percentiles.
  2. The FastLongHistogram has a few usages outside of jmx metrics. I haven't audited them, but we should be sure that any change here will not adversely affect the expectations of those usages.

We'll need to address those issues in a way that still achieves the goal.

bbeaudreault avatar Dec 08 '22 21:12 bbeaudreault