hudi
hudi copied to clipboard
[SUPPORT] Hudi sync requires hadoop and hive installed. Very heavy weight
Describe the problem you faced
Hudi sync requires hadoop and hive installed. Very heavy weight. As a utility, it is possible to isolate what specific libraries are needed for Hudi Sync to run?
To Reproduce
Per hudi-sync/hudi-hive-sync/run_sync_tool.sh
if [ -z "${HADOOP_HOME}" ]; then
error_exit "Please make sure the environment variable HADOOP_HOME is setup"
fi
if [ -z "${HIVE_HOME}" ]; then
error_exit "Please make sure the environment variable HIVE_HOME is setup"
fi
``
**Expected behavior**
Utilities should be lightweight.
**Environment Description**
* Hudi version :
* Spark version :
* Hive version :
* Hadoop version :
* Storage (HDFS/S3/GCS..) :
* Running on Docker? (yes/no) :
**Additional context**
Add any other context about the problem here.
**Stacktrace**
```Add the stacktrace of the error.```
I think it is the HMS client requires these dependencies, it looks like Hive does not have a light-weight API for meta sync. Since 3.1, Hive even add Calcite as a meta sync validation.
Closing the issue as nothing much can be done from hudi side on this.