hudi
hudi copied to clipboard
[SUPPORT] /hoodie/temp Folder and contents not getting deleted
Tips before filing an issue
-
Have you gone through our FAQs?
-
Join the mailing list to engage in conversations and get faster support at [email protected].
-
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
A clear and concise description of the problem.
Upon writing to tables in s3 using Hudi, Hudi creates .hoodie/.temp/<commit_instant> artifacts in the metadata folder folder for the table. After write is complete, the temp artifacts get deleted along with the .temp/ folder. For a couple of our tables, we have noticed the temp artifacts never got deleted. We want to figure out why this occurred, and if it's safe to manually delete the artifacts remaining from past writes.
hoodie.datasource.write.operationisupsertfor these operations- Hudi 0.8.0
To Reproduce
Steps to reproduce the behavior:
Not sure, we are writing to the same table every 10 minutes consistently and seeing this occur once for a couple of tables
Expected behavior
We expect that the temp artifacts are deleted after each write to a table
Environment Description
-
Hudi version : 0.8.0
-
Spark version : Spark 2.4.7
-
Hive version : Hive 2.3.7
-
Hadoop version : Amazon 2.10.1
-
Storage (HDFS/S3/GCS..) : S3
-
Running on Docker? (yes/no) : No
Hudi stores marker files in temp folder for tracking uncommitted data files. My question is had those commit_instant been done? hence you can query some records on the condition that _hoodie_commit_time = commit_instant or find the instant in the '.hoodie/ folder'
The instant 20220502180145 is in the .hoodie/.temp/ folder, but not in the .hoodie/ folder. I also see no records when querying _hoodie_commit_time = 20220502180145
@fengjian428 : can you follow up here please.
@desaismi could you try the latest version to check whether this issue still exists?
Hello, @fengjian428 we have some dependencies on our data pipeline that makes upgrading to the latest version non-trivial. Is this a known issue for 0.8.0?
If it's a rare intermittent issue, would it be safe to manually remove this marker file from the temp folder?
yeah, it should be safe to remove the marker files if no relevant inflight instant in the timeline
We had a bug around compaction not cleaning up the marker files which was fixed in 0.10.0 https://github.com/apache/hudi/pull/3576 So, yes, we do know of some situations where markers files were not cleaned up.
yes, if there are no matching commit files in /.hoodie/, you can remove the directories from /.hoodie/.temp folder.
feel free to close out the issue, if you don't have any follow ups.