kafka-connect-hdfs icon indicating copy to clipboard operation
kafka-connect-hdfs copied to clipboard

Error when trying to consume data on kafka produced by debezium

Open lexnjugz opened this issue 7 years ago • 4 comments

Am getting the following error while trying to import data from kafka. Data has been produced by a debezium connector. Data is in avro format. Funny enough the data is being consumed by kafka-avro-console-consumer with no error. [2018-09-25 13:54:41,762] ERROR WorkerSinkTask{id=hdfs-debezium} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:172) org.apache.kafka.connect.errors.SchemaBuilderException: Invalid default value at org.apache.kafka.connect.data.SchemaBuilder.defaultValue(SchemaBuilder.java:131) at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1562) at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1467) at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1443) at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1467) at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1443) at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1323) at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:1047) at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:88) at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:454) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:287) at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:198) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:166) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.kafka.connect.errors.DataException: Invalid Java object for schema type BYTES: class [B for field: "null" at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:240) at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:209) at org.apache.kafka.connect.data.SchemaBuilder.defaultValue(SchemaBuilder.java:129) ... 19 more

lexnjugz avatar Sep 25 '18 14:09 lexnjugz

I see this as well when consuming debezium generated data.

jhiza avatar Oct 02 '18 18:10 jhiza

Is there a solution? I am stuck here too.

jiancaiHub avatar Nov 07 '18 09:11 jiancaiHub

The stacktrace does not seem specific to HDFS Connect.

Can one of you please update this issue with an SSCCE?

OneCricketeer avatar Nov 09 '18 00:11 OneCricketeer

The best way to handle this is with an SMT to filter these, or put them in DLQ.

cyrusv avatar Apr 02 '19 20:04 cyrusv