hbase icon indicating copy to clipboard operation
hbase copied to clipboard

HBASE-28904 Supports enabling storage policy in the data copying scenario of bulkload

Open 2005hithlj opened this issue 1 year ago • 11 comments

https://issues.apache.org/jira/browse/HBASE-28904

2005hithlj avatar Oct 05 '24 09:10 2005hithlj

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 38s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 :green_heart: mvninstall 3m 11s master passed
+1 :green_heart: compile 3m 4s master passed
+1 :green_heart: checkstyle 0m 39s master passed
+1 :green_heart: spotbugs 1m 38s master passed
+1 :green_heart: spotless 0m 47s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 2m 57s the patch passed
+1 :green_heart: compile 3m 3s the patch passed
+1 :green_heart: javac 3m 3s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 0m 37s the patch passed
+1 :green_heart: spotbugs 1m 42s the patch passed
+1 :green_heart: hadoopcheck 10m 39s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 :green_heart: spotless 0m 44s patch has no errors when running spotless:check.
_ Other Tests _
+1 :green_heart: asflicense 0m 12s The patch does not generate ASF License warnings.
36m 38s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/6347
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux b8ee008ee1e5 5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a5addd5c4acf7ef5f16ab81b4705e245c0d404a2
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/1/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Oct 05 '24 10:10 Apache-HBase

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 27s Docker mode activated.
-0 :warning: yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 :green_heart: mvninstall 3m 17s master passed
+1 :green_heart: compile 0m 57s master passed
+1 :green_heart: javadoc 0m 27s master passed
+1 :green_heart: shadedjars 5m 46s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 3m 3s the patch passed
+1 :green_heart: compile 0m 57s the patch passed
+1 :green_heart: javac 0m 57s the patch passed
+1 :green_heart: javadoc 0m 26s the patch passed
+1 :green_heart: shadedjars 5m 43s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 :green_heart: unit 211m 43s hbase-server in the patch passed.
237m 10s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/6347
Optional Tests javac javadoc unit compile shadedjars
uname Linux 462efdb3b437 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a5addd5c4acf7ef5f16ab81b4705e245c0d404a2
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/1/testReport/
Max. process+thread count 5299 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Oct 05 '24 13:10 Apache-HBase

@Apache9 sir. Could you take a look? Thanks.

2005hithlj avatar Oct 15 '24 12:10 2005hithlj

The current implementation sets a tiered storage policy for the bulktoken/family directory, which is cleaned up after bulkloading. Therefore, I haven't figured out how to verify it through UT.

2005hithlj avatar Oct 15 '24 13:10 2005hithlj

This patch has already been used by our customers in production environments.

2005hithlj avatar Oct 15 '24 13:10 2005hithlj

Hi @2005hithlj Have you tried HBASE-15172? Based on description of HBASE-15172 we already support this for bulkload. Does that JIRA not work?

NihalJain avatar Oct 15 '24 17:10 NihalJain

@NihalJain Thanks for your review. The Bulkload process consists of two steps:

  1. generate hfiles using MR/SPARK and write them to an HDFS cluster.
  2. execute 'hbase completebulkload [OPTIONS] </PATH/TO/HFILEOUTPUTFORMAT-OUTPUT> <TABLENAME> ' or invoke the BulkLoadHFilesTool API.

HBASE-1721 implements tiered storage capabilities for bulkload, but it is only applicable to scenarios where hfiles generated by MR/SPARK are directly written to the HDFS cluster used by HBase (tiered storage is configured). However, in most bulkload scenarios, hfiles generated by MR/SPARK are first written to an offline HDFS cluster (non-HBase HDFS Cluster, and tiered storage is not configured). Subsequently, the 'hbase completebulkload' command is used to copy these hfiles from the offline HDFS cluster to the HDFS cluster used by HBase, and rename them to the appropriate table/region/columnfamily directory. This scenario is not supported by HBASE-1721, this issue will support tiered storage for this more general bulkload scenario.

2005hithlj avatar Oct 16 '24 05:10 2005hithlj

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 42s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 :ok: mvndep 0m 9s Maven dependency ordering for branch
+1 :green_heart: mvninstall 3m 4s master passed
+1 :green_heart: compile 3m 44s master passed
+1 :green_heart: checkstyle 0m 55s master passed
-1 :x: spotbugs 1m 34s /branch-spotbugs-hbase-server-warnings.html hbase-server in master has 1 extant spotbugs warnings.
+1 :green_heart: spotless 0m 46s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 11s Maven dependency ordering for patch
+1 :green_heart: mvninstall 2m 58s the patch passed
+1 :green_heart: compile 3m 47s the patch passed
+1 :green_heart: javac 3m 47s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 0m 55s the patch passed
+1 :green_heart: spotbugs 2m 27s the patch passed
+1 :green_heart: hadoopcheck 11m 9s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 :green_heart: spotless 0m 45s patch has no errors when running spotless:check.
_ Other Tests _
+1 :green_heart: asflicense 0m 21s The patch does not generate ASF License warnings.
40m 56s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/6347
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 6f19b49822f8 5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 204a5ac828efb077f7734b16a416ab6e35e86fbd
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-common hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/3/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Oct 16 '24 15:10 Apache-HBase

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 30s Docker mode activated.
-0 :warning: yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 :ok: mvndep 0m 9s Maven dependency ordering for branch
+1 :green_heart: mvninstall 3m 15s master passed
+1 :green_heart: compile 1m 15s master passed
+1 :green_heart: javadoc 0m 43s master passed
+1 :green_heart: shadedjars 5m 44s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 13s Maven dependency ordering for patch
+1 :green_heart: mvninstall 2m 59s the patch passed
+1 :green_heart: compile 1m 16s the patch passed
+1 :green_heart: javac 1m 16s the patch passed
+1 :green_heart: javadoc 0m 41s the patch passed
+1 :green_heart: shadedjars 5m 38s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 :green_heart: unit 2m 15s hbase-common in the patch passed.
-1 :x: unit 224m 13s /patch-unit-hbase-server.txt hbase-server in the patch failed.
253m 42s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/3/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/6347
Optional Tests javac javadoc unit compile shadedjars
uname Linux 500ec2cc2961 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 204a5ac828efb077f7734b16a416ab6e35e86fbd
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/3/testReport/
Max. process+thread count 5501 (vs. ulimit of 30000)
modules C: hbase-common hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/3/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Oct 16 '24 18:10 Apache-HBase

@NihalJain Thanks for your review. The Bulkload process consists of two steps:

  1. generate hfiles using MR/SPARK and write them to an HDFS cluster.
  2. execute 'hbase completebulkload [OPTIONS] </PATH/TO/HFILEOUTPUTFORMAT-OUTPUT> ' or invoke the BulkLoadHFilesTool API.

HBASE-1721 implements tiered storage capabilities for bulkload, but it is only applicable to scenarios where hfiles generated by MR/SPARK are directly written to the HDFS cluster used by HBase (tiered storage is configured). However, in most bulkload scenarios, hfiles generated by MR/SPARK are first written to an offline HDFS cluster (non-HBase HDFS Cluster, and tiered storage is not configured). Subsequently, the 'hbase completebulkload' command is used to copy these hfiles from the offline HDFS cluster to the HDFS cluster used by HBase, and rename them to the appropriate table/region/columnfamily directory. This scenario is not supported by HBASE-1721, this issue will support tiered storage for this more general bulkload scenario.

Thank you for the detailed explanation @2005hithlj

NihalJain avatar Oct 17 '24 11:10 NihalJain

Since there are not UTs here, please specify exact steps to test this so that others can validate / use the functionality.

NihalJain avatar Oct 17 '24 11:10 NihalJain