kafka-connect-hdfs
kafka-connect-hdfs copied to clipboard
Hive integration is not currently supported with JSON format
format.class=io.confluent.connect.hdfs.json.JsonFormat
java.lang.UnsupportedOperationException: Hive integration is not currently supported with JSON format
at io.confluent.connect.hdfs.json.JsonFormat.getHiveFactory(JsonFormat.java:68)
at io.confluent.connect.hdfs.DataWriter.
Are you sure you need to post the stacktrace? It's a known issue.
In order to create a Hive table from JSON, you must know the schema ahead of time.
In order to know the schema, it must exist in the message. https://rmoff.net/2017/09/06/kafka-connect-jsondeserializer-with-schemas.enable-requires-schema-and-payload-fields/
If a schema exists in the message, I believe you can actually use AvroFormat
i user format.class is json not avro
I understand that.
If you look at the code, JSON doesn't support Hive. https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/json/JsonFormat.java
If you look at the Avro or Parquet folders, there is Hive support.
Your Kafka topics don't need to contain Avro data in order to write to Avro data to HDFS. Similarly, you wouldn't be able to put Parquet data into a topic.
If you set value.converter to be JSON, set the schemas property to be enabled, then you can use either Avro or Parquet format.class setting, and then you should be able to get a Hive table