kafka-connect-hdfs
kafka-connect-hdfs copied to clipboard
Error when trying to consume data on kafka produced by debezium
Am getting the following error while trying to import data from kafka.
Data has been produced by a debezium connector.
Data is in avro format.
Funny enough the data is being consumed by kafka-avro-console-consumer with no error.
[2018-09-25 13:54:41,762] ERROR WorkerSinkTask{id=hdfs-debezium} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:172) org.apache.kafka.connect.errors.SchemaBuilderException: Invalid default value at org.apache.kafka.connect.data.SchemaBuilder.defaultValue(SchemaBuilder.java:131) at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1562) at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1467) at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1443) at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1467) at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1443) at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1323) at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:1047) at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:88) at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:454) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:287) at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:198) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:166) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.kafka.connect.errors.DataException: Invalid Java object for schema type BYTES: class [B for field: "null" at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:240) at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:209) at org.apache.kafka.connect.data.SchemaBuilder.defaultValue(SchemaBuilder.java:129) ... 19 more
I see this as well when consuming debezium generated data.
Is there a solution? I am stuck here too.
The stacktrace does not seem specific to HDFS Connect.
Can one of you please update this issue with an SSCCE?
The best way to handle this is with an SMT to filter these, or put them in DLQ.