hudi
hudi copied to clipboard
[SUPPORT] DELETE_PARTITION causes AWS Athena Query failure
Describe the problem you faced
A clear and concise description of the problem.
To Reproduce
Steps to reproduce the behavior:
- DELETE_PARTITION for non-existing partition ( e.g. org_id=55555 )
- since it will raise an exception, you have to wrap the Spark Write.
- this operation will creates org_id=55555_\$folder$ in Hudi Table Path ( BTW, why is it even created? )
- UPSERT to other partition ( e.g. org_id=24 )
- Check the current status
- you will see org_id=55555 partition is in Glue Catalog
- Go to Athena / Run Query
- you will see that the query will fail due to the missing path org_id=55555 in S3
Expected behavior
org_id=55555 MUST not be registered to Catalog
Environment Description
- Hudi version : 0.10.1
- Spark version : 3.1.1-amzn-0
- Hive version : 2.3.7-amzn-4
- Hadoop version : 3.2.1-amzn-3
- Storage (HDFS/S3/GCS..) : S3
- Running on Docker? (yes/no) : NO
@Gatsby-Lee You mentioned
since it will raise an exception, you have to wrap the Spark Write.
What exception did you get? I ran a test locally and tried to delete a non-existing partition. The replacecommit due to DELETE_PARTITION succeeded in my case (though it did not delete any data). Though it sounds counter-intuitive and one would expect to fail fast in such scenarios, but we do not do so because detecting non-existing partitions require listing which is a costly operation. Instead, from 0.11.0 onwards, the partitions are lazily deleted by the cleaner. If a partition does not exist, then even though the DELETE_PARTITION operation will succeed, nothing will be deleted and no extra metadata folder will be created. Can you please try again after upgrading to version 0.11.1 ?
Btw, org_id=55555_\$folder$ maybe an S3 thing. Did the partition org_id=55555 ever existed before?
@codope hi,
First, as of 0.11.x, DELETE_PARTITION ( in AWS Glue Catalog ) doesn't fail or raise exception. ( It's different from 0.10.x ) Second, like you said the actual delete is done by cleaner ( lazy ), but before the actual delete, Hudi seems to try to delete metadata in AWS Glue Catalog first. Third, org_id=55555 has never existed.
I will try to replicate the issue with 0.11.1 and post the output here. ( I don't remember if I reproduced this issue with 0.11.0 or not. Anyway, I will try again )
@Gatsby-Lee Thanks for the info. I will wait for more updates from you after testing with 0.11.1. However, I can think of one thing which we can improve. Before executing the delete_partition command, we can check for whether the partition exists or not, log a warning if it does not and return early without doing any kind of modification. HUDI-4591 to track this.
@Gatsby-Lee : any updates here please.
@Gatsby-Lee Gentle reminder. Can we close this issue?
Hi, let's close issue if I am the only one facing the issue.
Let me write more details before I forget.
A couple of months ago, I tried DELETE_PARTITION operation with 0.10.1 and 0.11.0 I noticed that 0.11.0 and 0.10.1 have different behavior when HUDI runs DELETE_PARTITION operation on not existing partition.
- 0.10.1 raised exception and failed. ( the serious issue was Hudi became unstable
- 0.11.0 was silence. ( VC told me that this is not the right behavior either. It should raise exception )
I wasn't able to use 0.11.0 because it has a compatibility issue in AWS Glue. ( it was related to AWS Glue Catalog ) I wasn't able to use 0.10.1 because it has a bug in ZookeeperLockProvider.
I ended up using 0.10.1 + a patch that fixed the ZookeeperLockProvider ( available on 0.11.1 ) And, I added a logic that checks if the target partition exists ( cc @codope )
I will test with 0.11.1 and reopen this ticket if I still notice the similar issue.
Thank you Gatsby
@Gatsby-Lee Thanks and I have noted your point. Would you mind upstreaming your fix (logic that checks if the target partition exists). I believe this would be helpful for other users as well. If so, please assign HUDI-4591 to yourself.