kafka-connect-hdfs icon indicating copy to clipboard operation
kafka-connect-hdfs copied to clipboard

Hive integration is not currently supported with JSON format

Open chengang2 opened this issue 7 years ago • 3 comments

format.class=io.confluent.connect.hdfs.json.JsonFormat java.lang.UnsupportedOperationException: Hive integration is not currently supported with JSON format at io.confluent.connect.hdfs.json.JsonFormat.getHiveFactory(JsonFormat.java:68) at io.confluent.connect.hdfs.DataWriter.(DataWriter.java:292) at io.confluent.connect.hdfs.DataWriter.(DataWriter.java:101) at io.confluent.connect.hdfs.HdfsSinkTask.start(HdfsSinkTask.java:82) at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:301) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:190) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

chengang2 avatar Sep 19 '18 02:09 chengang2

Are you sure you need to post the stacktrace? It's a known issue.

In order to create a Hive table from JSON, you must know the schema ahead of time.

In order to know the schema, it must exist in the message. https://rmoff.net/2017/09/06/kafka-connect-jsondeserializer-with-schemas.enable-requires-schema-and-payload-fields/

If a schema exists in the message, I believe you can actually use AvroFormat

OneCricketeer avatar Sep 19 '18 04:09 OneCricketeer

i user format.class is json not avro

chengang2 avatar Sep 19 '18 07:09 chengang2

I understand that.

If you look at the code, JSON doesn't support Hive. https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/json/JsonFormat.java

If you look at the Avro or Parquet folders, there is Hive support.

Your Kafka topics don't need to contain Avro data in order to write to Avro data to HDFS. Similarly, you wouldn't be able to put Parquet data into a topic.

If you set value.converter to be JSON, set the schemas property to be enabled, then you can use either Avro or Parquet format.class setting, and then you should be able to get a Hive table

OneCricketeer avatar Sep 19 '18 13:09 OneCricketeer