hbase icon indicating copy to clipboard operation
hbase copied to clipboard

HBASE-28488 Use encoded name in region span attributes

Open dethi opened this issue 1 year ago • 10 comments

On our busy clusters, the alloc profile shows that createRegionSpan() is responsible for 15-20% of all the allocations. These allocations comes from getRegionNameAsString().

getRegionNameAsString() takes the region name and encode invisible characters in their hex representation. This requires the use of a StringBuilder and thus generate new strings every time.

This becomes really expensive on a cluster with high number of requests. It seems better to just take the encoded region name (the md5 part) and use that in trace attributes, because:

  • it's fixed in size (the full region name can be much longer depending on the rowkey size),
  • it's enough information to link a trace to a region,
  • it doesn't require any new allocation.

dethi avatar Nov 14 '24 21:11 dethi

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 30s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 :ok: mvndep 0m 10s Maven dependency ordering for branch
+1 :green_heart: mvninstall 2m 58s master passed
+1 :green_heart: compile 3m 44s master passed
+1 :green_heart: checkstyle 0m 50s master passed
+1 :green_heart: spotbugs 2m 11s master passed
+1 :green_heart: spotless 0m 42s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 10s Maven dependency ordering for patch
+1 :green_heart: mvninstall 2m 47s the patch passed
+1 :green_heart: compile 3m 39s the patch passed
+1 :green_heart: javac 3m 39s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 0m 49s the patch passed
+1 :green_heart: spotbugs 2m 19s the patch passed
+1 :green_heart: hadoopcheck 10m 47s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
-1 :x: spotless 0m 16s patch has 23 errors when running spotless:check, run spotless:apply to fix.
_ Other Tests _
+1 :green_heart: asflicense 0m 17s The patch does not generate ASF License warnings.
39m 21s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6473/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/6473
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 2e5b71637e7d 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 69e54bb487679a36947fa1534c9f2177ced17b88
Default Java Eclipse Adoptium-17.0.11+9
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6473/1/artifact/yetus-general-check/output/patch-spotless.txt
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-client hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6473/1/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Nov 14 '24 22:11 Apache-HBase

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 51s Docker mode activated.
-0 :warning: yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 :ok: mvndep 0m 8s Maven dependency ordering for branch
+1 :green_heart: mvninstall 3m 58s master passed
+1 :green_heart: compile 1m 27s master passed
+1 :green_heart: javadoc 0m 58s master passed
+1 :green_heart: shadedjars 6m 45s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 13s Maven dependency ordering for patch
+1 :green_heart: mvninstall 3m 56s the patch passed
+1 :green_heart: compile 1m 57s the patch passed
+1 :green_heart: javac 1m 57s the patch passed
+1 :green_heart: javadoc 0m 57s the patch passed
+1 :green_heart: shadedjars 6m 30s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 :x: unit 2m 3s /patch-unit-hbase-client.txt hbase-client in the patch failed.
-1 :x: unit 259m 14s /patch-unit-hbase-server.txt hbase-server in the patch failed.
294m 8s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6473/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/6473
Optional Tests javac javadoc unit compile shadedjars
uname Linux 7ac63953e579 5.4.0-195-generic #215-Ubuntu SMP Fri Aug 2 18:28:05 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 69e54bb487679a36947fa1534c9f2177ced17b88
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6473/1/testReport/
Max. process+thread count 5582 (vs. ulimit of 30000)
modules C: hbase-client hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6473/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Nov 15 '24 02:11 Apache-HBase

Interesting.

Could you please upload a flame graph about this? I think @ndimiduk should have tested performance impact when implementing this feature. It is a bit strange why we didn't catch this before.

Thanks.

Apache9 avatar Nov 15 '24 14:11 Apache9

The flamegraph is attached to the jira ticket.

dethi avatar Nov 15 '24 16:11 dethi

Hm I need to fix some of the tests. The patch was for HBase 2.5.10, the actual version we run. Didn't think of making sure it was enough for HBase 3.x.

dethi avatar Nov 15 '24 16:11 dethi

So the problem is allocation?

What if we just store the region name some where so we do not need to construct it every time?

Apache9 avatar Nov 16 '24 04:11 Apache9

Actually, this patch changes the values attached to the span, which means the attribute key name is no longer accurate. If we cannot preserve the existing values, we should change the attribute as well and we'll need a release note.

ndimiduk avatar Nov 16 '24 08:11 ndimiduk

Oops. Yeah that sounds bad. Thank you for fixing it. What is your configured sampling rate?

We actually don’t have tracing enabled. But spans will still be created even if not sampled, AFAIK.

dethi avatar Nov 16 '24 11:11 dethi

So the problem is allocation?

What if we just store the region name some where so we do not need to construct it every time?

That’s another option yes. To me it felt like the encoded name was a better attribute because it’s always the same size and short. Whereas the region name is highly dependent on the table data itself. The encoded region name is also what is used as a label/attribute in the per-region metrics.

For example, our region name tend to be quite long because our rowkeys are arbitrary long filesystem-like path, encoded in a binary format (message pack).

dethi avatar Nov 16 '24 11:11 dethi

Okay I agree with you that the encoded region name is better anyway. We've not discussed how we treat span attributed wrt interface compatibility across versions. I think that this should be okay for the next minor release without controversy. We can also deprecate the unencoded name version, but we'll need to call that out in a release note.

For patch releases, we'll potentially break people by removing the attribute. If we could avoid the allocations but preserve existing behavior, that will be the most compatible solution and can be ported to all active branches.

ndimiduk avatar Mar 31 '25 19:03 ndimiduk