kafka-connect-hdfs icon indicating copy to clipboard operation
kafka-connect-hdfs copied to clipboard

Confluent Kafka Connect Docker container java.io.OEFException

Open huwdjones opened this issue 6 years ago • 1 comments

Hi there,

I'm trying to connect up my Confluent Kafka Connect Docker container to a HDFS Docker container (sequenceiq/hadoop:2.7.1) using the bundled Confluent HDFS connector.

All containers are run from the same docker network with ports 8020 and 9000 exposed on the HDFS Docker container.

My HDFS connector properties are:

{ "name": "hdfs-sink", "config": { "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector", "tasks.max": "1", "topics": "kafka-test", "hdfs.url": "hdfs://host.docker.internal:9000", "flush.size": "3", "name": "hdfs-sink" } }

I am able to write files locally within the HDFS Docker container however when attempting this via the HDFS connector I see the following error:

[2018-07-31 13:23:18,491] INFO Couldn't start HdfsSinkConnector: (io.confluent.connect.hdfs.HdfsSinkTask) org.apache.kafka.connect.errors.ConnectException: java.io.EOFException: End of File Exception between local host is: "05a1d1b381da/172.21.0.4"; destination host is: "host.docker.internal":9000; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException at io.confluent.connect.hdfs.storage.HdfsStorage.exists(HdfsStorage.java:109) at io.confluent.connect.hdfs.DataWriter.createDir(DataWriter.java:531) at io.confluent.connect.hdfs.DataWriter.<init>(DataWriter.java:220) at io.confluent.connect.hdfs.DataWriter.<init>(DataWriter.java:101) at io.confluent.connect.hdfs.HdfsSinkTask.start(HdfsSinkTask.java:82) at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:281) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:170) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.EOFException: End of File Exception between local host is: "05a1d1b381da/172.21.0.4"; destination host is: "host.docker.internal":9000; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765) at org.apache.hadoop.ipc.Client.call(Client.java:1479) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy53.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy54.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426) at io.confluent.connect.hdfs.storage.HdfsStorage.exists(HdfsStorage.java:107) ... 13 more Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1084) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:979) [2018-07-31 13:23:18,493] INFO Shutting down HdfsSinkConnector. (io.confluent.connect.hdfs.HdfsSinkTask)

I have tried both ports 8020 and 9000 but see the same error which is not the same error as I see when falsifying the hdfs.url property - "java.net.ConnectException: Connection refused". Based on this does this mean that there is some other issue than being able to connect?

huwdjones avatar Jul 31 '18 13:07 huwdjones

to a HDFS Docker container (sequenceiq/hadoop:2.7.1)

Then you should use the Docker container's service name that hosts the Hadoop NameNode, and not host.docker.internal.

A good way to set this up would be using Docker Compose. Otherwise, refer to connecting containers to Docker bridge networks, where the Hadoop container is first attached to a network, then you run connect container with the same network name.

OneCricketeer avatar Apr 02 '19 20:04 OneCricketeer