hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] hive-sync

Open clp007 opened this issue 1 year ago • 4 comments

To Reproduce

Whether there are parameters in hive_sync can be controlled. Each synchronization will only incrementally synchronize the partition contents, and will no longer complete the missing partitions in hive-matestore. Because I will clean up the historical hive partition data to ensure that there is a stable amount of partition data in hive instead of growing all the time.

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 0.14.1

  • Spark version : spark3.3

  • Hive version : 3.1.3

  • Hadoop version : 3.3.6

  • Storage (HDFS/S3/GCS..) : GCS

  • Running on Docker? (yes/no) : no

clp007 avatar Aug 02 '24 02:08 clp007

This is actually caused by the inconsistency between the partition metadata of hive and the partition metadata of hudi.

In my opinion, should we change our thinking, for example, when cleaning the hive partition, also delete the hudi partition metadata?

BruceKellan avatar Aug 02 '24 04:08 BruceKellan

Thank you for your reply, it is a good idea. Can you provide a method to safely delete hudi meta? It feels like a dangerous behavior

clp007 avatar Aug 02 '24 06:08 clp007

Because I will clean up the historical hive partition data to ensure that there is a stable amount of partition data in hive instead of growing all the time.

That's a pragmatic idea, would you mind to contribute it, should be a minor work.

danny0405 avatar Aug 03 '24 02:08 danny0405

@BruceKellan How you are planning to clean your partitions?

ad1happy2go avatar Aug 22 '24 09:08 ad1happy2go

shouldn't we try to leverage partition TTL support in hudi to delete older partitions.

nsivabalan avatar Sep 13 '24 19:09 nsivabalan