hbase icon indicating copy to clipboard operation
hbase copied to clipboard

HBASE-29108: Cleanup storefiles when file close fails during compaction

Open tomasbanet opened this issue 10 months ago • 9 comments

Environment

Hadoop server version: Version: 3.4.0, rbd8b77f398f626bb7791783192ee7a5dfaeec760 HBase Version: 2.5.10-hadoop3, revision=a3af60980c61fb4be31e0dcd89880f304d01098a Hadoop client version used by HBase: 3.3.6, revision=1be78238728da9266a4f88195058f08fd012bf9c

Relevant hdfs-site.xml config:

      	<property>
        	<name>dfs.replication</name>
        	<value>3</value>
      	</property>
      	<property>
        	<name>dfs.namenode.replication.min</name>
        	<value>2</value>
      	</property>

HBase table configured with DFS_REPLICATION => 2.

What

Cleanup storefiles in .tmp directory when compaction fails due to IOException in writer.close(); during DefaultCompactor:commitWriter(), to avoid leaving around unused & under-construction storefiles.

In our test clusters we observed multiple instances of the following exception when compacting a table with DFS_REPLICATION => 2 and dfs.namenode.replication.min is set to 2, and a HDFS datanode that was writing the compaction result is SIGKILL'ed.

2025-01-27 03:53:38.161    	at java.base/java.lang.Thread.run(Unknown Source)
2025-01-27 03:53:38.161    	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
2025-01-27 03:53:38.161    	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:716)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:667)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2278)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1151)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:122)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:393)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.commitWriter(DefaultCompactor.java:38)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.commitWriter(DefaultCompactor.java:70)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:293)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hbase.regionserver.StoreFileWriter$SingleStoreFileWriter.access$1200(StoreFileWriter.java:493)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hbase.regionserver.StoreFileWriter$SingleStoreFileWriter.close(StoreFileWriter.java:786)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:663)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:829)
2025-01-27 03:53:38.161    	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
2025-01-27 03:53:38.161    	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:892)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:909)
2025-01-27 03:53:38.161    	at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:969)
2025-01-27 03:53:38.161	java.io.IOException: Unable to close file because the last block BP-984347226-240.160.11.12-1670593994803:blk_1079072071_5477525 does not have enough number of replicas.
2025-01-27 03:53:38.161	2025-01-27 03:53:38,161 ERROR [regionserver/regionserver-4:16020-longCompactions-0] regionserver.CompactSplit: Compaction failed region=test_table,\x01,1726087406036.27f01f729d8d992d100222f6fce09565., storeName=27f01f729d8d992d100222f6fce09565/e, priority=12, startTime=1737949992446
2025-01-27 03:53:38.159	2025-01-27 03:53:38,159 INFO  [regionserver/regionserver-4:16020-longCompactions-0] hdfs.DFSClient: Could not complete /hbase/data/default/test_table/27f01f729d8d992d100222f6fce09565/.tmp/e/b49ffc9768a441598d2279abd6ba3325 retrying…
2025-01-27 03:53:31,758 INFO  [regionserver/regionserver-4:16020-longCompactions-0] hdfs.DFSClient: Could not complete /hbase/data/default/test_table/27f01f729d8d992d100222f6fce09565/.tmp/e/b49ffc9768a441598d2279abd6ba3325 retrying…

Active namenode logs:

2025-01-27 03:53:38.160	2025-01-27 03:53:38,160 INFO namenode.FSNamesystem: BLOCK* blk_1079072071_5477525 is COMMITTED but not COMPLETE(numNodes= 1 <  minimum = 2) in file /hbase/data/default/test_table/27f01f729d8d992d100222f6fce09565/.tmp/e/b49ffc9768a441598d2279abd6ba3325

The storefile in .tmp directory remains open/under-construction as indicated by hdfs dfsadmin -listOpenFiles.

Later namenode may mark some of the block replicas of this storefile as corrupt:

FSCK started by hadoop (auth:SIMPLE) from /240.160.21.252 at Mon Jan 27 21:21:37 UTC 2025
Block Id: blk_1079072071
Block belongs to: /hbase/data/default/test_table/27f01f729d8d992d100222f6fce09565/.tmp/e/b49ffc9768a441598d2279abd6ba3325
No. of Expected Replica: 2
No. of live Replica: 1
No. of excess Replica: 0
No. of stale Replica: 0
No. of decommissioned Replica: 0
No. of decommissioning Replica: 0
No. of corrupted Replica: 1
Block replica on datanode/rack: datanode-v12-0-hadoop.hadoop/default-rack is CORRUPT	 ReasonCode: GENSTAMP_MISMATCH
Block replica on datanode/rack: datanode-v11-3-hadoop.hadoop/default-rack is HEALTHY

However, corrupt replica is not fixed as long as the file remains open.

As work-around, must re-open the region (e.g. restart regionserver), which will cause all files under .tmp to be deleted.

Implementation Notes

BAD behaviour: In org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.commitWriter(DefaultCompactor.java:70), we call writer.close(). IOException may be thrown by writer.close() and is not caught.

Example GOOD behaviour: In DefaultCompactor:abortWriter(), we call writer.close() with a try-catch statement. If IOException is caught, storefiles written to .tmp directory are deleted.

Fix: Wrap writer.close() call in commitWriter() in a try-catch statement and cleanup storefiles in case writer.close() throws IOException, and re-throw the exception to preserve current behaviour of commitWriter().

Add cleanup helper method called by both commitWriter() and abortWriter()

Testing

Run 1 hbasemaster, 1 regionserver, 3 HDFS datanodes with hdfs-site.xml config given above.

Create test table with option DFS_REPLICATION => 2.

Pre-split the table into 20 regions (enough so that region files are spread around all 3 datanodes), write and flush <128 MiB of data to each region (each region should have 1 storefile that will have 1 HDFS block).

Run the following command for to repeatedly trigger compactions and SIGKILL and restart 1 datanode:

while true; do echo "major_compact 'test_table'" | bin/hbase shell -n && sleep $(( RANDOM % 10 )) && kill -SIGKILL $(jps | grep "DataNode" | cut -d ' ' -f 1 | head -n 1); command_to_restart_all_datanodes; sleep 30; done

After ~10 minutes, observe as dozens of storefiles are left around in .tmp directories:

25-01-28 17:21:25 :: hdfs dfs -ls -R /hbase/data/default/test_table/ | grep ".tmp/e/"
-rw-r--r--   2 tomasb supergroup   34499479 2025-01-28 17:06 /hbase/data/default/test_table/05a7b0914c86b057f63751e8048051c1/.tmp/e/43267a0cc82546c7a35e1ab9802c2ab6
-rw-r--r--   2 tomasb supergroup   34499410 2025-01-28 17:09 /hbase/data/default/test_table/05a7b0914c86b057f63751e8048051c1/.tmp/e/c3c14f169c434cf19e857305ec740d3d
-rw-r--r--   2 tomasb supergroup   69328898 2025-01-28 17:03 /hbase/data/default/test_table/39e857016934b9ff7c96d8a7bbd28617/.tmp/e/164b07ca7dc2426a8b6addcaccab8a56
-rw-r--r--   2 tomasb supergroup   69328932 2025-01-28 17:10 /hbase/data/default/test_table/39e857016934b9ff7c96d8a7bbd28617/.tmp/e/34b0f11a90174862a8aa98d5017d4897
-rw-r--r--   2 tomasb supergroup   69328898 2025-01-28 16:39 /hbase/data/default/test_table/39e857016934b9ff7c96d8a7bbd28617/.tmp/e/7d707278ded847f29434a07a762e07b6
-rw-r--r--   2 tomasb supergroup   34499514 2025-01-28 17:04 /hbase/data/default/test_table/46e9610f88a53238f4e493442c1f559f/.tmp/e/07aedd687623493c9fd5b5ac1659b04d
-rw-r--r--   2 tomasb supergroup   34499444 2025-01-28 17:09 /hbase/data/default/test_table/46e9610f88a53238f4e493442c1f559f/.tmp/e/4ab35a08cf954dd0af32ae316ceefe42
-rw-r--r--   2 tomasb supergroup   68477765 2025-01-28 17:06 /hbase/data/default/test_table/4713d8c0cd11eded425fa66ce65a319a/.tmp/e/18ef189729c74cc0b912a238b25c277e
...

Observe as storefiles remain open / under-construction:

25-01-28 17:21:31 :: hdfs dfsadmin -listOpenFiles
Client Host         	Client Name         	Open File Path
192.168.0.21        	DFSClient_NONMAPREDUCE_-1807495440_1	/hbase/data/default/test_table/be0f2fc0a596b51c3a261da3bd142df1/.tmp/e/5cf60cb9012e440e93f8d0bfefde530d
192.168.0.21        	DFSClient_NONMAPREDUCE_-1807495440_1	/hbase/WALs/home-host-1,16020,1738081727251/home-host-1%2C16020%2C1738081727251.1738082328949
192.168.0.21        	DFSClient_NONMAPREDUCE_-1807495440_1	/hbase/data/default/test_table/39e857016934b9ff7c96d8a7bbd28617/.tmp/e/7d707278ded847f29434a07a762e07b6
192.168.0.21        	DFSClient_NONMAPREDUCE_-1807495440_1	/hbase/data/default/test_table/b168116917028b9d650f5ecfc46d4bf6/.tmp/e/9f499eb1d52a4fdf8fe3efea403c60bd
192.168.0.21        	DFSClient_NONMAPREDUCE_-1807495440_1	/hbase/data/default/test_table/846e1635811bff4b99f90bca68689a74/.tmp/e/b13af9a221964abcbdb88607eb703bf3
192.168.0.21        	DFSClient_NONMAPREDUCE_-1807495440_1	/hbase/WALs/home-host-1,16020,1738081727251/home-host-1%2C16020%2C1738081727251.meta.1738083661870.meta
192.168.0.21        	DFSClient_NONMAPREDUCE_-1807495440_1	/hbase/data/default/test_table/92b0e358333c7bccf57a966a712cd560/.tmp/e/675c14a4c0fa43199cfc49785ed3f87f
192.168.0.21        	DFSClient_NONMAPREDUCE_-1807495440_1	/hbase/data/default/test_table/846e1635811bff4b99f90bca68689a74/.tmp/e/10d42be38744490b958be09c4a3815b7
192.168.0.21        	DFSClient_NONMAPREDUCE_-1807495440_1	/hbase/data/default/test_table/62bc3f7199a06af306d093f7a5fb38b9/.tmp/e/8d5ad0a40796482197055b2d19f2b4b9
192.168.0.21        	DFSClient_NONMAPREDUCE_-1807495440_1	/hbase/data/default/test_table/92b0e358333c7bccf57a966a712cd560/.tmp/e/120bf1921c2a45ff8d3437e1fdaf07d8
...

disable and enable the test table to re-open all regions and clear files under .tmp directories.

Apply the patch and run the test again. Observe as failed compactions do not cause .tmp directories to fill up with storefiles and storefiles are not left open.

tomasbanet avatar Feb 04 '25 12:02 tomasbanet

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 30s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 :green_heart: mvninstall 3m 21s master passed
+1 :green_heart: compile 3m 9s master passed
+1 :green_heart: checkstyle 0m 37s master passed
+1 :green_heart: spotbugs 1m 35s master passed
+1 :green_heart: spotless 0m 45s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 3m 5s the patch passed
+1 :green_heart: compile 3m 7s the patch passed
+1 :green_heart: javac 3m 7s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 0m 35s the patch passed
+1 :green_heart: spotbugs 1m 38s the patch passed
+1 :green_heart: hadoopcheck 11m 42s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 :green_heart: spotless 0m 43s patch has no errors when running spotless:check.
_ Other Tests _
+1 :green_heart: asflicense 0m 10s The patch does not generate ASF License warnings.
38m 18s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6664/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/6664
JIRA Issue HBASE-29108
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux e254478aee24 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 31769b7905ef6c74eae81df7d0d60c1c53cb8105
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 87 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6664/1/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Feb 04 '25 14:02 Apache-HBase

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 29s Docker mode activated.
-0 :warning: yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 :green_heart: mvninstall 3m 2s master passed
+1 :green_heart: compile 0m 56s master passed
+1 :green_heart: javadoc 0m 27s master passed
+1 :green_heart: shadedjars 5m 55s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 3m 4s the patch passed
+1 :green_heart: compile 0m 56s the patch passed
+1 :green_heart: javac 0m 56s the patch passed
+1 :green_heart: javadoc 0m 26s the patch passed
+1 :green_heart: shadedjars 5m 48s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 :green_heart: unit 217m 24s hbase-server in the patch passed.
242m 25s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6664/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/6664
JIRA Issue HBASE-29108
Optional Tests javac javadoc unit compile shadedjars
uname Linux 6cde7fd9a8ee 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 31769b7905ef6c74eae81df7d0d60c1c53cb8105
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6664/1/testReport/
Max. process+thread count 5191 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6664/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Feb 04 '25 18:02 Apache-HBase

@Apache9 @ndimiduk do you know who could review this change?

tomasbanet avatar Feb 13 '25 17:02 tomasbanet

The upper layer will not call abortWriter when closeWriter throws exception? Strange...

Apache9 avatar Feb 14 '25 14:02 Apache9

Heya @tomasbanet do you have a Jira ID? I'll need an ID in our Jira system in order to assign this contribution to you. If you do not, please make one Self serve sign up page and let me know. thanks!

ndimiduk avatar Feb 17 '25 11:02 ndimiduk

The upper layer will not call abortWriter when closeWriter throws exception? Strange...

Nope, seems it does not.

ndimiduk avatar Feb 17 '25 12:02 ndimiduk

@tomasbanet are you able to capture the failure scenario as a new test method in TestCompaction? Ideally we'd add some coverage for this scenario that does not require a real cluster. Thanks.

ndimiduk avatar Feb 17 '25 12:02 ndimiduk

Looks like we've had this behavior of preserving the failed output file for a long time, since at least HBASE-4078 -- TestCompaction#testCompactionWithCorruptResult().

ndimiduk avatar Feb 17 '25 15:02 ndimiduk

Heya @tomasbanet any thoughts here?

ndimiduk avatar Mar 31 '25 19:03 ndimiduk