kafka-connect-storage-cloud icon indicating copy to clipboard operation
kafka-connect-storage-cloud copied to clipboard

java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/lib/output/FileOutputFormat

Open spanglerco opened this issue 2 years ago • 5 comments

Using the 10.1.1 connector downloaded from https://www.confluent.io/hub/confluentinc/kafka-connect-s3 configured to write data in parquet format (format.class: io.confluent.connect.s3.format.parquet.ParquetFormat), the worker throws a java.lang.NoClassDefFoundError exception upon consuming a message:

[2022-08-25 15:30:08,520] ERROR [dfs6307956c2b55f364871599a5|task-0] WorkerSinkTask{id=dfs6307956c2b55f364871599a5-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. Error: org/apache/hadoop/mapreduce/lib/output/FileOutputFormat (org.apache.kafka.connect.runtime.WorkerSinkTask:609)

java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/lib/output/FileOutputFormat
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:555)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:458)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:452)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:451)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:95)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:285)
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:675)
at io.confluent.connect.s3.format.parquet.ParquetRecordWriterProvider$1.write(ParquetRecordWriterProvider.java:102)
at io.confluent.connect.s3.format.S3RetriableRecordWriter.write(S3RetriableRecordWriter.java:46)
at io.confluent.connect.s3.format.KeyValueHeaderRecordWriterProvider$1.write(KeyValueHeaderRecordWriterProvider.java:107)
at io.confluent.connect.s3.TopicPartitionWriter.writeRecord(TopicPartitionWriter.java:554)
at io.confluent.connect.s3.TopicPartitionWriter.checkRotationOrAppend(TopicPartitionWriter.java:303)
at io.confluent.connect.s3.TopicPartitionWriter.executeState(TopicPartitionWriter.java:246
at io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:197)
at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:234)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:333)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:234)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:203)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:103)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 31 more

It appears that hadoop-mapred-0.22.0.jar is missing from the lib directory in the downloaded zip file. Manually adding this file to the plugin's lib directory avoids the error.

spanglerco avatar Aug 25 '22 17:08 spanglerco

Yes I accidentally updated connect cluster with ansible and config was upgraded including the plugins. Thank you @spanglerco this actually helped.

azadsagar avatar Aug 30 '22 14:08 azadsagar

You shouldn't use hadoop-mapred, you should use hadoop-common, at least versions 2.10.2

OneCricketeer avatar Sep 13 '22 15:09 OneCricketeer

Hi, I have the same issue and must add the hadoop-mapred:0.22.0 lib for the connector to work. @OneCricketeer the hadoop-common:2.10.2 is present in the connector artifact so it's not the issue.

loicmathieu avatar Sep 16 '22 12:09 loicmathieu

What about mapreduce-client-core?

That's where that class currently exists

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/pom.xml

My main point is that adding lesser versions of Hadoop libraries will lead to potential security vulnerabilities and other incompatibilities

OneCricketeer avatar Sep 16 '22 12:09 OneCricketeer

@OneCricketeer yes, mapreduce-client-core also works. It's still a 1.5MB lib

loicmathieu avatar Sep 16 '22 13:09 loicmathieu

This is still issue in version 10.2.0. Adding manually missing library hadoop-mapred is not a solution for us. Is this library removed by accident or with purpose?

ismarslomic avatar Oct 07 '22 13:10 ismarslomic

Missing library is back again in v10.2.1. So I guess this issue can be closed. It would be nice if Confluent could reference issue id in their commit message next time.

ismarslomic avatar Oct 07 '22 13:10 ismarslomic