kafka-connect-hdfs
kafka-connect-hdfs copied to clipboard
Kafka Connect HDFS connector
I'm starting with Kafka Connect, Hadoop/HDFS and Kerberos... so, I'm probably missing some basic concepts. In the published documentation (https://docs.confluent.io/kafka-connect-hdfs/current/overview.html#limitations), the following limitation is listed: > The HDFS 2 Sink...
We are working with connect-standalone script and this two configurations: **Worker.properties** -bootstrap.servers=127.0.0.1:9092 -key.converter=org.apache.kafka.connect.storage.StringConverter -value.converter=io.confluent.connect.avro.AvroConverter -key.converter.schemas.enable=false -value.converter.schemas.enable=false -key.converter.schema.registry.url=http://127.0.0.1:8081 -value.converter.schema.registry.url=http://127.0.0.1:8081 -internal.key.converter.schemas.enable=false -internal.value.converter.schemas.enable=false -offset.storage.file.filename=/tmp/connect.offsets -offset.flush.interval.ms=10000 -offset.flush.timeout.ms=50000 -producer.max.request.size=15728640 -rest.port=9002 -plugin.path= My path to connectors...
kafka-connect-hdfs has some Formater like:`Parquet`,`AVRO`,`String`,but for `StringFormat`,we cannot use compression,maybe we can enhance `StringFormat`,use snappy or gzip compression My suggestion is to add an configuration, such as `avro.codec`, maybe `string.codec`,...
my hive version is 1.2.1 my connector run in k8s i encounter errors like this: org.apache.thrift.TApplicationException: get_table failed: out of sequence response at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1218) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1204) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1208)...
We've run into an issue where sometimes when Kafka HDFS connect is stopped for any reason, an empty parquet file is generated in HDFS. After the connect worker is started,...
There is a breaking change in hive 2.3.0 in hiveMetastoreClient that causes an error when trying to get table from metastore with a lower version. The change is made in...
Is there a way to keep the original file name when storing in HDFS ? We are using kafka to move json files to HDFS, we want the names of...
Will there be support for a Protobuf format any time in the future for the interface `io.confluent.connect.storage.format.Format`? It would be really convenient to write protobuf data to the hdfs.
When I shutdown kafka connect, has errors: ``` [2021-07-22 18:14:00,035] DEBUG Closing TopicPartitionWriter testtopic-37 (io.confluent.connect.hdfs.TopicPartitionWriter:464) [2021-07-22 18:14:00,035] DEBUG Discarding in progress tempfile hdfs://10.10.106.70:8020//data/test//+tmp/testtopic/202107162050/9a603280-828d-42c8-95f4-87692589604f_tmp.txt for testtopic-37 202107162050 (io.confluent.connect.hdfs.TopicPartitionWriter:467) [2021-07-22 18:14:00,035] ERROR...
I've added a config for this plugin in order to specify the compression for parquest files. The setting is called 'parquet.codec' and can have the following values: none, snappy, gzip,...