HBASE-29108: Cleanup storefiles when file close fails during compaction
Environment
Hadoop server version: Version: 3.4.0, rbd8b77f398f626bb7791783192ee7a5dfaeec760 HBase Version: 2.5.10-hadoop3, revision=a3af60980c61fb4be31e0dcd89880f304d01098a Hadoop client version used by HBase: 3.3.6, revision=1be78238728da9266a4f88195058f08fd012bf9c
Relevant hdfs-site.xml config:
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.replication.min</name>
<value>2</value>
</property>
HBase table configured with DFS_REPLICATION => 2.
What
Cleanup storefiles in .tmp directory when compaction fails due to IOException in writer.close(); during DefaultCompactor:commitWriter(), to avoid leaving around unused & under-construction storefiles.
In our test clusters we observed multiple instances of the following exception when compacting a table with DFS_REPLICATION => 2 and dfs.namenode.replication.min is set to 2, and a HDFS datanode that was writing the compaction result is SIGKILL'ed.
2025-01-27 03:53:38.161 at java.base/java.lang.Thread.run(Unknown Source)
2025-01-27 03:53:38.161 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
2025-01-27 03:53:38.161 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
2025-01-27 03:53:38.161 at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:716)
2025-01-27 03:53:38.161 at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:667)
2025-01-27 03:53:38.161 at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2278)
2025-01-27 03:53:38.161 at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1151)
2025-01-27 03:53:38.161 at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:122)
2025-01-27 03:53:38.161 at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
2025-01-27 03:53:38.161 at org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:393)
2025-01-27 03:53:38.161 at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.commitWriter(DefaultCompactor.java:38)
2025-01-27 03:53:38.161 at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.commitWriter(DefaultCompactor.java:70)
2025-01-27 03:53:38.161 at org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:293)
2025-01-27 03:53:38.161 at org.apache.hadoop.hbase.regionserver.StoreFileWriter$SingleStoreFileWriter.access$1200(StoreFileWriter.java:493)
2025-01-27 03:53:38.161 at org.apache.hadoop.hbase.regionserver.StoreFileWriter$SingleStoreFileWriter.close(StoreFileWriter.java:786)
2025-01-27 03:53:38.161 at org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:663)
2025-01-27 03:53:38.161 at org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:829)
2025-01-27 03:53:38.161 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
2025-01-27 03:53:38.161 at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
2025-01-27 03:53:38.161 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:847)
2025-01-27 03:53:38.161 at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:892)
2025-01-27 03:53:38.161 at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:909)
2025-01-27 03:53:38.161 at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:969)
2025-01-27 03:53:38.161 java.io.IOException: Unable to close file because the last block BP-984347226-240.160.11.12-1670593994803:blk_1079072071_5477525 does not have enough number of replicas.
2025-01-27 03:53:38.161 2025-01-27 03:53:38,161 ERROR [regionserver/regionserver-4:16020-longCompactions-0] regionserver.CompactSplit: Compaction failed region=test_table,\x01,1726087406036.27f01f729d8d992d100222f6fce09565., storeName=27f01f729d8d992d100222f6fce09565/e, priority=12, startTime=1737949992446
2025-01-27 03:53:38.159 2025-01-27 03:53:38,159 INFO [regionserver/regionserver-4:16020-longCompactions-0] hdfs.DFSClient: Could not complete /hbase/data/default/test_table/27f01f729d8d992d100222f6fce09565/.tmp/e/b49ffc9768a441598d2279abd6ba3325 retrying…
2025-01-27 03:53:31,758 INFO [regionserver/regionserver-4:16020-longCompactions-0] hdfs.DFSClient: Could not complete /hbase/data/default/test_table/27f01f729d8d992d100222f6fce09565/.tmp/e/b49ffc9768a441598d2279abd6ba3325 retrying…
Active namenode logs:
2025-01-27 03:53:38.160 2025-01-27 03:53:38,160 INFO namenode.FSNamesystem: BLOCK* blk_1079072071_5477525 is COMMITTED but not COMPLETE(numNodes= 1 < minimum = 2) in file /hbase/data/default/test_table/27f01f729d8d992d100222f6fce09565/.tmp/e/b49ffc9768a441598d2279abd6ba3325
The storefile in .tmp directory remains open/under-construction as indicated by hdfs dfsadmin -listOpenFiles.
Later namenode may mark some of the block replicas of this storefile as corrupt:
FSCK started by hadoop (auth:SIMPLE) from /240.160.21.252 at Mon Jan 27 21:21:37 UTC 2025
Block Id: blk_1079072071
Block belongs to: /hbase/data/default/test_table/27f01f729d8d992d100222f6fce09565/.tmp/e/b49ffc9768a441598d2279abd6ba3325
No. of Expected Replica: 2
No. of live Replica: 1
No. of excess Replica: 0
No. of stale Replica: 0
No. of decommissioned Replica: 0
No. of decommissioning Replica: 0
No. of corrupted Replica: 1
Block replica on datanode/rack: datanode-v12-0-hadoop.hadoop/default-rack is CORRUPT ReasonCode: GENSTAMP_MISMATCH
Block replica on datanode/rack: datanode-v11-3-hadoop.hadoop/default-rack is HEALTHY
However, corrupt replica is not fixed as long as the file remains open.
As work-around, must re-open the region (e.g. restart regionserver), which will cause all files under .tmp to be deleted.
Implementation Notes
BAD behaviour:
In org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.commitWriter(DefaultCompactor.java:70), we call writer.close(). IOException may be thrown by writer.close() and is not caught.
Example GOOD behaviour:
In DefaultCompactor:abortWriter(), we call writer.close() with a try-catch statement. If IOException is caught, storefiles written to .tmp directory are deleted.
Fix:
Wrap writer.close() call in commitWriter() in a try-catch statement and cleanup storefiles in case writer.close() throws IOException, and re-throw the exception to preserve current behaviour of commitWriter().
Add cleanup helper method called by both commitWriter() and abortWriter()
Testing
Run 1 hbasemaster, 1 regionserver, 3 HDFS datanodes with hdfs-site.xml config given above.
Create test table with option DFS_REPLICATION => 2.
Pre-split the table into 20 regions (enough so that region files are spread around all 3 datanodes), write and flush <128 MiB of data to each region (each region should have 1 storefile that will have 1 HDFS block).
Run the following command for to repeatedly trigger compactions and SIGKILL and restart 1 datanode:
while true; do echo "major_compact 'test_table'" | bin/hbase shell -n && sleep $(( RANDOM % 10 )) && kill -SIGKILL $(jps | grep "DataNode" | cut -d ' ' -f 1 | head -n 1); command_to_restart_all_datanodes; sleep 30; done
After ~10 minutes, observe as dozens of storefiles are left around in .tmp directories:
25-01-28 17:21:25 :: hdfs dfs -ls -R /hbase/data/default/test_table/ | grep ".tmp/e/"
-rw-r--r-- 2 tomasb supergroup 34499479 2025-01-28 17:06 /hbase/data/default/test_table/05a7b0914c86b057f63751e8048051c1/.tmp/e/43267a0cc82546c7a35e1ab9802c2ab6
-rw-r--r-- 2 tomasb supergroup 34499410 2025-01-28 17:09 /hbase/data/default/test_table/05a7b0914c86b057f63751e8048051c1/.tmp/e/c3c14f169c434cf19e857305ec740d3d
-rw-r--r-- 2 tomasb supergroup 69328898 2025-01-28 17:03 /hbase/data/default/test_table/39e857016934b9ff7c96d8a7bbd28617/.tmp/e/164b07ca7dc2426a8b6addcaccab8a56
-rw-r--r-- 2 tomasb supergroup 69328932 2025-01-28 17:10 /hbase/data/default/test_table/39e857016934b9ff7c96d8a7bbd28617/.tmp/e/34b0f11a90174862a8aa98d5017d4897
-rw-r--r-- 2 tomasb supergroup 69328898 2025-01-28 16:39 /hbase/data/default/test_table/39e857016934b9ff7c96d8a7bbd28617/.tmp/e/7d707278ded847f29434a07a762e07b6
-rw-r--r-- 2 tomasb supergroup 34499514 2025-01-28 17:04 /hbase/data/default/test_table/46e9610f88a53238f4e493442c1f559f/.tmp/e/07aedd687623493c9fd5b5ac1659b04d
-rw-r--r-- 2 tomasb supergroup 34499444 2025-01-28 17:09 /hbase/data/default/test_table/46e9610f88a53238f4e493442c1f559f/.tmp/e/4ab35a08cf954dd0af32ae316ceefe42
-rw-r--r-- 2 tomasb supergroup 68477765 2025-01-28 17:06 /hbase/data/default/test_table/4713d8c0cd11eded425fa66ce65a319a/.tmp/e/18ef189729c74cc0b912a238b25c277e
...
Observe as storefiles remain open / under-construction:
25-01-28 17:21:31 :: hdfs dfsadmin -listOpenFiles
Client Host Client Name Open File Path
192.168.0.21 DFSClient_NONMAPREDUCE_-1807495440_1 /hbase/data/default/test_table/be0f2fc0a596b51c3a261da3bd142df1/.tmp/e/5cf60cb9012e440e93f8d0bfefde530d
192.168.0.21 DFSClient_NONMAPREDUCE_-1807495440_1 /hbase/WALs/home-host-1,16020,1738081727251/home-host-1%2C16020%2C1738081727251.1738082328949
192.168.0.21 DFSClient_NONMAPREDUCE_-1807495440_1 /hbase/data/default/test_table/39e857016934b9ff7c96d8a7bbd28617/.tmp/e/7d707278ded847f29434a07a762e07b6
192.168.0.21 DFSClient_NONMAPREDUCE_-1807495440_1 /hbase/data/default/test_table/b168116917028b9d650f5ecfc46d4bf6/.tmp/e/9f499eb1d52a4fdf8fe3efea403c60bd
192.168.0.21 DFSClient_NONMAPREDUCE_-1807495440_1 /hbase/data/default/test_table/846e1635811bff4b99f90bca68689a74/.tmp/e/b13af9a221964abcbdb88607eb703bf3
192.168.0.21 DFSClient_NONMAPREDUCE_-1807495440_1 /hbase/WALs/home-host-1,16020,1738081727251/home-host-1%2C16020%2C1738081727251.meta.1738083661870.meta
192.168.0.21 DFSClient_NONMAPREDUCE_-1807495440_1 /hbase/data/default/test_table/92b0e358333c7bccf57a966a712cd560/.tmp/e/675c14a4c0fa43199cfc49785ed3f87f
192.168.0.21 DFSClient_NONMAPREDUCE_-1807495440_1 /hbase/data/default/test_table/846e1635811bff4b99f90bca68689a74/.tmp/e/10d42be38744490b958be09c4a3815b7
192.168.0.21 DFSClient_NONMAPREDUCE_-1807495440_1 /hbase/data/default/test_table/62bc3f7199a06af306d093f7a5fb38b9/.tmp/e/8d5ad0a40796482197055b2d19f2b4b9
192.168.0.21 DFSClient_NONMAPREDUCE_-1807495440_1 /hbase/data/default/test_table/92b0e358333c7bccf57a966a712cd560/.tmp/e/120bf1921c2a45ff8d3437e1fdaf07d8
...
disable and enable the test table to re-open all regions and clear files under .tmp directories.
Apply the patch and run the test again. Observe as failed compactions do not cause .tmp directories to fill up with storefiles and storefiles are not left open.
:confetti_ball: +1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| +0 :ok: | reexec | 0m 30s | Docker mode activated. | |
| _ Prechecks _ | ||||
| +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. | |
| +0 :ok: | codespell | 0m 0s | codespell was not available. | |
| +0 :ok: | detsecrets | 0m 0s | detect-secrets was not available. | |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | |
| +1 :green_heart: | hbaseanti | 0m 0s | Patch does not have any anti-patterns. | |
| _ master Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 3m 21s | master passed | |
| +1 :green_heart: | compile | 3m 9s | master passed | |
| +1 :green_heart: | checkstyle | 0m 37s | master passed | |
| +1 :green_heart: | spotbugs | 1m 35s | master passed | |
| +1 :green_heart: | spotless | 0m 45s | branch has no errors when running spotless:check. | |
| _ Patch Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 3m 5s | the patch passed | |
| +1 :green_heart: | compile | 3m 7s | the patch passed | |
| +1 :green_heart: | javac | 3m 7s | the patch passed | |
| +1 :green_heart: | blanks | 0m 0s | The patch has no blanks issues. | |
| +1 :green_heart: | checkstyle | 0m 35s | the patch passed | |
| +1 :green_heart: | spotbugs | 1m 38s | the patch passed | |
| +1 :green_heart: | hadoopcheck | 11m 42s | Patch does not cause any errors with Hadoop 3.3.6 3.4.0. | |
| +1 :green_heart: | spotless | 0m 43s | patch has no errors when running spotless:check. | |
| _ Other Tests _ | ||||
| +1 :green_heart: | asflicense | 0m 10s | The patch does not generate ASF License warnings. | |
| 38m 18s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6664/1/artifact/yetus-general-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/6664 |
| JIRA Issue | HBASE-29108 |
| Optional Tests | dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless |
| uname | Linux e254478aee24 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / 31769b7905ef6c74eae81df7d0d60c1c53cb8105 |
| Default Java | Eclipse Adoptium-17.0.11+9 |
| Max. process+thread count | 87 (vs. ulimit of 30000) |
| modules | C: hbase-server U: hbase-server |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6664/1/console |
| versions | git=2.34.1 maven=3.9.8 spotbugs=4.7.3 |
| Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
This message was automatically generated.
:confetti_ball: +1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| +0 :ok: | reexec | 0m 29s | Docker mode activated. | |
| -0 :warning: | yetus | 0m 3s | Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck | |
| _ Prechecks _ | ||||
| _ master Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 3m 2s | master passed | |
| +1 :green_heart: | compile | 0m 56s | master passed | |
| +1 :green_heart: | javadoc | 0m 27s | master passed | |
| +1 :green_heart: | shadedjars | 5m 55s | branch has no errors when building our shaded downstream artifacts. | |
| _ Patch Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 3m 4s | the patch passed | |
| +1 :green_heart: | compile | 0m 56s | the patch passed | |
| +1 :green_heart: | javac | 0m 56s | the patch passed | |
| +1 :green_heart: | javadoc | 0m 26s | the patch passed | |
| +1 :green_heart: | shadedjars | 5m 48s | patch has no errors when building our shaded downstream artifacts. | |
| _ Other Tests _ | ||||
| +1 :green_heart: | unit | 217m 24s | hbase-server in the patch passed. | |
| 242m 25s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6664/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/6664 |
| JIRA Issue | HBASE-29108 |
| Optional Tests | javac javadoc unit compile shadedjars |
| uname | Linux 6cde7fd9a8ee 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / 31769b7905ef6c74eae81df7d0d60c1c53cb8105 |
| Default Java | Eclipse Adoptium-17.0.11+9 |
| Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6664/1/testReport/ |
| Max. process+thread count | 5191 (vs. ulimit of 30000) |
| modules | C: hbase-server U: hbase-server |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6664/1/console |
| versions | git=2.34.1 maven=3.9.8 |
| Powered by | Apache Yetus 0.15.0 https://yetus.apache.org |
This message was automatically generated.
@Apache9 @ndimiduk do you know who could review this change?
The upper layer will not call abortWriter when closeWriter throws exception? Strange...
Heya @tomasbanet do you have a Jira ID? I'll need an ID in our Jira system in order to assign this contribution to you. If you do not, please make one Self serve sign up page and let me know. thanks!
The upper layer will not call abortWriter when closeWriter throws exception? Strange...
Nope, seems it does not.
@tomasbanet are you able to capture the failure scenario as a new test method in TestCompaction? Ideally we'd add some coverage for this scenario that does not require a real cluster. Thanks.
Looks like we've had this behavior of preserving the failed output file for a long time, since at least HBASE-4078 -- TestCompaction#testCompactionWithCorruptResult().
Heya @tomasbanet any thoughts here?