kafka-connect-hdfs hdfs: java.io.FileNotFoundException: File does not exist ,while file was there

I'm really new in apache environment, and currently im trying to ingest kafka topics into hdfs using hdfs3sink connector. Most of my apache environment was installed via ambari HDP. i execute it by /usr/hdp/3.1.4.0-315/kafka/bin/connect-standalone.sh /etc/kafka/connect-standalone-json.properties /etc/kafka-connect-hdfs/quickstart-hdfs.properties

then it returning error as Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/root/topics/+tmp/testjson/partition=0/26c82453-4980-40de-a9d4-276aa0f3899e_tmp.avro (inode 22757) [Lease. Holder: DFSClient_NONMAPREDUCE_496631619_33, pending creates: 1] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2815) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:591) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2694) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511) at org.apache.hadoop.ipc.Client.call(Client.java:1457) at org.apache.hadoop.ipc.Client.call(Client.java:1367) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy47.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:510) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy48.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1081) ... 3 more [2020-06-23 10:35:30,240] ERROR WorkerSinkTask{id=hdfs-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177) org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception. at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:586) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322) at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.avro.AvroRuntimeException: already open at org.apache.avro.file.DataFileWriter.assertNotOpen(DataFileWriter.java:85) at org.apache.avro.file.DataFileWriter.setCodec(DataFileWriter.java:93) at io.confluent.connect.hdfs3.avro.AvroRecordWriterProvider$1.write(AvroRecordWriterProvider.java:59) at io.confluent.connect.hdfs3.TopicPartitionWriter.writeRecord(TopicPartitionWriter.java:675) at io.confluent.connect.hdfs3.TopicPartitionWriter.write(TopicPartitionWriter.java:374) at io.confluent.connect.hdfs3.DataWriter.write(DataWriter.java:359) at io.confluent.connect.hdfs3.Hdfs3SinkTask.put(Hdfs3SinkTask.java:108) at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:564) ... 10 more [2020-06-23 10:35:30,243] ERROR WorkerSinkTask{id=hdfs-sink-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:178)

i noticed Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/root/topics/+tmp/testjson/partition=0/26c82453-4980-40de-a9d4-276aa0f3899e_tmp.avro (inode 22757)

so then i checked it out. but then root@ambari:/home/hadoop# hdfs dfs -ls /user/root/topics/+tmp/testjson Found 1 items drwxr-xr-x - root hdfs 0 2020-06-24 06:18 /user/root/topics/+tmp/testjson/partition=0 root@ambari:/home/hadoop# hdfs dfs -ls /user/root/topics/+tmp/testjson/partition=0 Found 2 items -rw-r--r-- 3 root hdfs 0 2020-06-24 06:18 /user/root/topics/+tmp/testjson/partition=0/05cc9305-c370-44f8-8e9d-b311fb284e26_tmp.avro -rw-r--r-- 3 root hdfs 0 2020-06-24 03:26 /user/root/topics/+tmp/testjson/partition=0/26c82453-4980-40de-a9d4-276aa0f3899e_tmp.avro

always there, i dont think this is because of permission because i've sturned dfs.permissions.enabled into false. Please give me some suggestion, thank you so much

Jun 25 '20 04:06 yuliansen

Have you found a solution ?

Aug 26 '20 18:08 leobenkel

I got the same with problem. Is there anyone has solution?

Oct 12 '20 02:10 huynhphuong10284

I never finish this, but things must keep going so I decided to use nifi instead of confuent connect. If you insist I am going to tell you that hdfs connect required enterprise license back then (or even currently too) so you might need to pay for it or activate 30days trial. Good luck guys

Oct 14 '20 12:10 yuliansen

I am also facing the same problem with Hadoop 3.1.3 and Tez version 0.10.0. Can anyone help me to resolve this problem ?

Mar 25 '21 17:03 dhirendra31pandit

Have you found a solution ? i use hadoop 2.8.5 and spark 2.X

Nov 08 '21 13:11 mojirml

If you had a solution please share.

Mar 06 '24 13:03 AlirezaMoumivand

kafka-connect-hdfs kafka-connect-hdfs copied to clipboard

hdfs: java.io.FileNotFoundException: File does not exist ,while file was there

kafka-connect-hdfs
kafka-connect-hdfs copied to clipboard