kafka-connect-storage-cloud
kafka-connect-storage-cloud copied to clipboard
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/lib/output/FileOutputFormat
Using the 10.1.1 connector downloaded from https://www.confluent.io/hub/confluentinc/kafka-connect-s3 configured to write data in parquet format (format.class: io.confluent.connect.s3.format.parquet.ParquetFormat
), the worker throws a java.lang.NoClassDefFoundError
exception upon consuming a message:
[2022-08-25 15:30:08,520] ERROR [dfs6307956c2b55f364871599a5|task-0] WorkerSinkTask{id=dfs6307956c2b55f364871599a5-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. Error: org/apache/hadoop/mapreduce/lib/output/FileOutputFormat (org.apache.kafka.connect.runtime.WorkerSinkTask:609)
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/lib/output/FileOutputFormat
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:555)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:458)
at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:452)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:451)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:95)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:285)
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:675)
at io.confluent.connect.s3.format.parquet.ParquetRecordWriterProvider$1.write(ParquetRecordWriterProvider.java:102)
at io.confluent.connect.s3.format.S3RetriableRecordWriter.write(S3RetriableRecordWriter.java:46)
at io.confluent.connect.s3.format.KeyValueHeaderRecordWriterProvider$1.write(KeyValueHeaderRecordWriterProvider.java:107)
at io.confluent.connect.s3.TopicPartitionWriter.writeRecord(TopicPartitionWriter.java:554)
at io.confluent.connect.s3.TopicPartitionWriter.checkRotationOrAppend(TopicPartitionWriter.java:303)
at io.confluent.connect.s3.TopicPartitionWriter.executeState(TopicPartitionWriter.java:246
at io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:197)
at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:234)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:333)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:234)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:203)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:103)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 31 more
It appears that hadoop-mapred-0.22.0.jar is missing from the lib directory in the downloaded zip file. Manually adding this file to the plugin's lib directory avoids the error.
Yes I accidentally updated connect cluster with ansible and config was upgraded including the plugins. Thank you @spanglerco this actually helped.
You shouldn't use hadoop-mapred, you should use hadoop-common
, at least versions 2.10.2
Hi, I have the same issue and must add the hadoop-mapred:0.22.0 lib for the connector to work. @OneCricketeer the hadoop-common:2.10.2 is present in the connector artifact so it's not the issue.
What about mapreduce-client-core
?
That's where that class currently exists
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/pom.xml
My main point is that adding lesser versions of Hadoop libraries will lead to potential security vulnerabilities and other incompatibilities
@OneCricketeer yes, mapreduce-client-core
also works. It's still a 1.5MB lib
This is still issue in version 10.2.0
. Adding manually missing library hadoop-mapred
is not a solution for us. Is this library removed by accident or with purpose?
Missing library is back again in v10.2.1
. So I guess this issue can be closed. It would be nice if Confluent could reference issue id in their commit message next time.