HBASE-28904 Supports enabling storage policy in the data copying scenario of bulkload
https://issues.apache.org/jira/browse/HBASE-28904
:confetti_ball: +1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| +0 :ok: | reexec | 0m 38s | Docker mode activated. | |
| _ Prechecks _ | ||||
| +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. | |
| +0 :ok: | codespell | 0m 0s | codespell was not available. | |
| +0 :ok: | detsecrets | 0m 0s | detect-secrets was not available. | |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | |
| +1 :green_heart: | hbaseanti | 0m 0s | Patch does not have any anti-patterns. | |
| _ master Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 3m 11s | master passed | |
| +1 :green_heart: | compile | 3m 4s | master passed | |
| +1 :green_heart: | checkstyle | 0m 39s | master passed | |
| +1 :green_heart: | spotbugs | 1m 38s | master passed | |
| +1 :green_heart: | spotless | 0m 47s | branch has no errors when running spotless:check. | |
| _ Patch Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 2m 57s | the patch passed | |
| +1 :green_heart: | compile | 3m 3s | the patch passed | |
| +1 :green_heart: | javac | 3m 3s | the patch passed | |
| +1 :green_heart: | blanks | 0m 0s | The patch has no blanks issues. | |
| +1 :green_heart: | checkstyle | 0m 37s | the patch passed | |
| +1 :green_heart: | spotbugs | 1m 42s | the patch passed | |
| +1 :green_heart: | hadoopcheck | 10m 39s | Patch does not cause any errors with Hadoop 3.3.6 3.4.0. | |
| +1 :green_heart: | spotless | 0m 44s | patch has no errors when running spotless:check. | |
| _ Other Tests _ | ||||
| +1 :green_heart: | asflicense | 0m 12s | The patch does not generate ASF License warnings. | |
| 36m 38s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/1/artifact/yetus-general-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/6347 |
| Optional Tests | dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless |
| uname | Linux b8ee008ee1e5 5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / a5addd5c4acf7ef5f16ab81b4705e245c0d404a2 |
| Default Java | Eclipse Adoptium-17.0.11+9 |
| Max. process+thread count | 83 (vs. ulimit of 30000) |
| modules | C: hbase-server U: hbase-server |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/1/console |
| versions | git=2.34.1 maven=3.9.8 spotbugs=4.7.3 |
| Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
This message was automatically generated.
:confetti_ball: +1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| +0 :ok: | reexec | 0m 27s | Docker mode activated. | |
| -0 :warning: | yetus | 0m 3s | Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck | |
| _ Prechecks _ | ||||
| _ master Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 3m 17s | master passed | |
| +1 :green_heart: | compile | 0m 57s | master passed | |
| +1 :green_heart: | javadoc | 0m 27s | master passed | |
| +1 :green_heart: | shadedjars | 5m 46s | branch has no errors when building our shaded downstream artifacts. | |
| _ Patch Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 3m 3s | the patch passed | |
| +1 :green_heart: | compile | 0m 57s | the patch passed | |
| +1 :green_heart: | javac | 0m 57s | the patch passed | |
| +1 :green_heart: | javadoc | 0m 26s | the patch passed | |
| +1 :green_heart: | shadedjars | 5m 43s | patch has no errors when building our shaded downstream artifacts. | |
| _ Other Tests _ | ||||
| +1 :green_heart: | unit | 211m 43s | hbase-server in the patch passed. | |
| 237m 10s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/6347 |
| Optional Tests | javac javadoc unit compile shadedjars |
| uname | Linux 462efdb3b437 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / a5addd5c4acf7ef5f16ab81b4705e245c0d404a2 |
| Default Java | Eclipse Adoptium-17.0.11+9 |
| Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/1/testReport/ |
| Max. process+thread count | 5299 (vs. ulimit of 30000) |
| modules | C: hbase-server U: hbase-server |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/1/console |
| versions | git=2.34.1 maven=3.9.8 |
| Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
This message was automatically generated.
@Apache9 sir. Could you take a look? Thanks.
The current implementation sets a tiered storage policy for the bulktoken/family directory, which is cleaned up after bulkloading. Therefore, I haven't figured out how to verify it through UT.
This patch has already been used by our customers in production environments.
Hi @2005hithlj Have you tried HBASE-15172? Based on description of HBASE-15172 we already support this for bulkload. Does that JIRA not work?
@NihalJain Thanks for your review. The Bulkload process consists of two steps:
- generate hfiles using MR/SPARK and write them to an HDFS cluster.
- execute 'hbase completebulkload [OPTIONS] </PATH/TO/HFILEOUTPUTFORMAT-OUTPUT> <TABLENAME> ' or invoke the BulkLoadHFilesTool API.
HBASE-1721 implements tiered storage capabilities for bulkload, but it is only applicable to scenarios where hfiles generated by MR/SPARK are directly written to the HDFS cluster used by HBase (tiered storage is configured). However, in most bulkload scenarios, hfiles generated by MR/SPARK are first written to an offline HDFS cluster (non-HBase HDFS Cluster, and tiered storage is not configured). Subsequently, the 'hbase completebulkload' command is used to copy these hfiles from the offline HDFS cluster to the HDFS cluster used by HBase, and rename them to the appropriate table/region/columnfamily directory. This scenario is not supported by HBASE-1721, this issue will support tiered storage for this more general bulkload scenario.
:broken_heart: -1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| +0 :ok: | reexec | 0m 42s | Docker mode activated. | |
| _ Prechecks _ | ||||
| +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. | |
| +0 :ok: | codespell | 0m 0s | codespell was not available. | |
| +0 :ok: | detsecrets | 0m 0s | detect-secrets was not available. | |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | |
| +1 :green_heart: | hbaseanti | 0m 0s | Patch does not have any anti-patterns. | |
| _ master Compile Tests _ | ||||
| +0 :ok: | mvndep | 0m 9s | Maven dependency ordering for branch | |
| +1 :green_heart: | mvninstall | 3m 4s | master passed | |
| +1 :green_heart: | compile | 3m 44s | master passed | |
| +1 :green_heart: | checkstyle | 0m 55s | master passed | |
| -1 :x: | spotbugs | 1m 34s | /branch-spotbugs-hbase-server-warnings.html | hbase-server in master has 1 extant spotbugs warnings. |
| +1 :green_heart: | spotless | 0m 46s | branch has no errors when running spotless:check. | |
| _ Patch Compile Tests _ | ||||
| +0 :ok: | mvndep | 0m 11s | Maven dependency ordering for patch | |
| +1 :green_heart: | mvninstall | 2m 58s | the patch passed | |
| +1 :green_heart: | compile | 3m 47s | the patch passed | |
| +1 :green_heart: | javac | 3m 47s | the patch passed | |
| +1 :green_heart: | blanks | 0m 0s | The patch has no blanks issues. | |
| +1 :green_heart: | checkstyle | 0m 55s | the patch passed | |
| +1 :green_heart: | spotbugs | 2m 27s | the patch passed | |
| +1 :green_heart: | hadoopcheck | 11m 9s | Patch does not cause any errors with Hadoop 3.3.6 3.4.0. | |
| +1 :green_heart: | spotless | 0m 45s | patch has no errors when running spotless:check. | |
| _ Other Tests _ | ||||
| +1 :green_heart: | asflicense | 0m 21s | The patch does not generate ASF License warnings. | |
| 40m 56s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/3/artifact/yetus-general-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/6347 |
| Optional Tests | dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless |
| uname | Linux 6f19b49822f8 5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / 204a5ac828efb077f7734b16a416ab6e35e86fbd |
| Default Java | Eclipse Adoptium-17.0.11+9 |
| Max. process+thread count | 83 (vs. ulimit of 30000) |
| modules | C: hbase-common hbase-server U: . |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/3/console |
| versions | git=2.34.1 maven=3.9.8 spotbugs=4.7.3 |
| Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
This message was automatically generated.
:broken_heart: -1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| +0 :ok: | reexec | 0m 30s | Docker mode activated. | |
| -0 :warning: | yetus | 0m 2s | Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck | |
| _ Prechecks _ | ||||
| _ master Compile Tests _ | ||||
| +0 :ok: | mvndep | 0m 9s | Maven dependency ordering for branch | |
| +1 :green_heart: | mvninstall | 3m 15s | master passed | |
| +1 :green_heart: | compile | 1m 15s | master passed | |
| +1 :green_heart: | javadoc | 0m 43s | master passed | |
| +1 :green_heart: | shadedjars | 5m 44s | branch has no errors when building our shaded downstream artifacts. | |
| _ Patch Compile Tests _ | ||||
| +0 :ok: | mvndep | 0m 13s | Maven dependency ordering for patch | |
| +1 :green_heart: | mvninstall | 2m 59s | the patch passed | |
| +1 :green_heart: | compile | 1m 16s | the patch passed | |
| +1 :green_heart: | javac | 1m 16s | the patch passed | |
| +1 :green_heart: | javadoc | 0m 41s | the patch passed | |
| +1 :green_heart: | shadedjars | 5m 38s | patch has no errors when building our shaded downstream artifacts. | |
| _ Other Tests _ | ||||
| +1 :green_heart: | unit | 2m 15s | hbase-common in the patch passed. | |
| -1 :x: | unit | 224m 13s | /patch-unit-hbase-server.txt | hbase-server in the patch failed. |
| 253m 42s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/3/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/6347 |
| Optional Tests | javac javadoc unit compile shadedjars |
| uname | Linux 500ec2cc2961 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / 204a5ac828efb077f7734b16a416ab6e35e86fbd |
| Default Java | Eclipse Adoptium-17.0.11+9 |
| Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/3/testReport/ |
| Max. process+thread count | 5501 (vs. ulimit of 30000) |
| modules | C: hbase-common hbase-server U: . |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6347/3/console |
| versions | git=2.34.1 maven=3.9.8 |
| Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
This message was automatically generated.
@NihalJain Thanks for your review. The Bulkload process consists of two steps:
- generate hfiles using MR/SPARK and write them to an HDFS cluster.
- execute 'hbase completebulkload [OPTIONS] </PATH/TO/HFILEOUTPUTFORMAT-OUTPUT> ' or invoke the BulkLoadHFilesTool API.
HBASE-1721 implements tiered storage capabilities for bulkload, but it is only applicable to scenarios where hfiles generated by MR/SPARK are directly written to the HDFS cluster used by HBase (tiered storage is configured). However, in most bulkload scenarios, hfiles generated by MR/SPARK are first written to an offline HDFS cluster (non-HBase HDFS Cluster, and tiered storage is not configured). Subsequently, the 'hbase completebulkload' command is used to copy these hfiles from the offline HDFS cluster to the HDFS cluster used by HBase, and rename them to the appropriate table/region/columnfamily directory. This scenario is not supported by HBASE-1721, this issue will support tiered storage for this more general bulkload scenario.
Thank you for the detailed explanation @2005hithlj
Since there are not UTs here, please specify exact steps to test this so that others can validate / use the functionality.