hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] Hudi sync requires hadoop and hive installed. Very heavy weight

Open alberttwong opened this issue 1 year ago • 1 comments

Describe the problem you faced

Hudi sync requires hadoop and hive installed. Very heavy weight. As a utility, it is possible to isolate what specific libraries are needed for Hudi Sync to run?

To Reproduce

Per hudi-sync/hudi-hive-sync/run_sync_tool.sh

if [ -z "${HADOOP_HOME}" ]; then
  error_exit "Please make sure the environment variable HADOOP_HOME is setup"
fi

if [ -z "${HIVE_HOME}" ]; then
  error_exit "Please make sure the environment variable HIVE_HOME is setup"
fi
``

**Expected behavior**

Utilities should be lightweight.  

**Environment Description**

* Hudi version :

* Spark version :

* Hive version :

* Hadoop version :

* Storage (HDFS/S3/GCS..) :

* Running on Docker? (yes/no) :


**Additional context**

Add any other context about the problem here.

**Stacktrace**

```Add the stacktrace of the error.```

alberttwong avatar Aug 29 '24 03:08 alberttwong

I think it is the HMS client requires these dependencies, it looks like Hive does not have a light-weight API for meta sync. Since 3.1, Hive even add Calcite as a meta sync validation.

danny0405 avatar Aug 29 '24 09:08 danny0405

Closing the issue as nothing much can be done from hudi side on this.

ad1happy2go avatar Sep 05 '24 13:09 ad1happy2go