hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[SUPPORT] Hive SYNC TOOL on EMR failed, Exception in thread main java.ang.NoClassDefFoundError: com/fasterxml/...

Open huliwuli opened this issue 1 year ago • 4 comments

Tips before filing an issue

Describe the problem you faced

Did Async Clustering on EMR 6.14 and Hive on Athena did not sync the latest commit after clustering? I want to use the hive sync tool to sync it.

When using

cd /usr/lib/hudi/bin

./run_sync_tool.sh --base-path s3://<bucket_name>/<prefix>/<table_name> --database <database_name> --table <table_name> --partitioned-by <column_name>

I got the error caused by java.lang.ClassNotFoundException: com.fasterxml.jackson,datatype.jsr310.JavaTimeModule.

Also, I noticed AWS documentation includes use-jdbc false image

so I did

cd /usr/lib/hudi/bin

./run_sync_tool.sh --base-path s3://<bucket_name>/<prefix>/<table_name> --database <database_name> --table <table_name> --partitioned-by <column_name> --sync-mode hms --use-jdbc false --sync-tool-classes org.apache.hudi.hive.MultiPartKeysValueExtractor

Then I got: 'false' but no main parameter was defined in your arg class

Environment Description

Hudi version : 0.13.0

Spark version : 3.4.1

Hive version : 0.13.1

Hadoop version :

Storage (HDFS/S3/GCS..) : S3

Running on Docker? (yes/no) : NO

huliwuli avatar Feb 23 '24 20:02 huliwuli

Looks like a jackson jar conflict.

danny0405 avatar Feb 24 '24 04:02 danny0405

Looks like a jackson jar conflict.

Is there anything I can do for this issue?

huliwuli avatar Feb 24 '24 19:02 huliwuli

Finds out where the legacy jackson comes from and remove it from the classpath.

danny0405 avatar Feb 25 '24 00:02 danny0405

Finds out where the legacy jackson comes from and remove it from the classpath.

Ok, thanks I will try ... since it's on EMR. Not sure whether I have permission to remove it. Or do you know which EMR/Hudi version is suitable to solve this issue?

huliwuli avatar Feb 25 '24 16:02 huliwuli

@huliwuli Did you tried using 0.14.1 ? 0.13.0 was not even supported with spark 3.4

ad1happy2go avatar Feb 27 '24 14:02 ad1happy2go

Sorry, Looks like you are using the AWS managed hudi. Can you try using emr-6.15.0 which has hudi 0.14.0

ad1happy2go avatar Feb 27 '24 14:02 ad1happy2go

Sorry, Looks like you are using the AWS managed hudi. Can you try using emr-6.15.0 which has hudi 0.14.0

EMR 6.15 worked, I tested it yesterday.

huliwuli avatar Feb 27 '24 15:02 huliwuli

Great! Thanks lot @huliwuli. Closing out this issue then. Please reopen in case you have any concerns.

ad1happy2go avatar Feb 27 '24 15:02 ad1happy2go