kafka-connect-hdfs
kafka-connect-hdfs copied to clipboard
Not clear in docs relation between tasks.max and number of consumers attached to topic partition
Hi we have 25 topics each topic having 2 partition , we have created connect config having topics.regex, so that connector consumes from all 25 topics with tasks.max set to 50 i.e(one unique consumer per partition) but when we describe the consumer group only two unique consumers are attached to 50 partition.
here's the config: { "name": "testConnectorfinalTest04", "config": { "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector", "tasks.max": "50", "hdfs.url": "hdfs://...:9000", "hadoop.conf.dir": "/home/tomcat/hadoop/etc/hadoop", "hadoop.home": "/home/tomcat/hadoop/", "hive.conf.dir": "/home/tomcat/hive/conf", "hive.home": "/home/hadoop/apache-hive-2.3.5-bin/", "flush.size": "1000", "format.class": "io.confluent.connect.hdfs.parquet.ParquetFormat", "hive.integration": "true", "hive.database": "testfinal04", "hive.metastore.uris": "thrift://...:9083", "schema.compatibility": "BACKWARD", "partitioner.class": "io.confluent.connect.storage.partitioner.FieldPartitionerWithTimePartition", "locale": "en", "topics.regex": "datapipelinefinaltest...topic", "partition.field.name": "tenant,groupid,project,name,year,month,day,hour", "partition.field.name.with.time": "systime", "timezone": "Asia/Calcutta", "rotate.schedule.interval.ms":"120000" } }
consumer describe output shows only two unique consumer get assigned to partitions : TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_23.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_12.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_16.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_20.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_18.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_11.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_22.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_2.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_25.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_8.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_10.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_5.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_14.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_9.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_1.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_21.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_3.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_13.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_6.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_24.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_7.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_4.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_19.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_13.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_12.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_20.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_8.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_4.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_2.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_23.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_21.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_16.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_14.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_9.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_19.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_5.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_25.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_3.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_18.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_7.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_10.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_24.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_6.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_22.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_1.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_11.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29
tasks.max configuration comes from the framework and specifies the maximum number of tasks to be created for the connector. But, fewer tasks may be created. - https://docs.confluent.io/current/connect/managing/configuring.html
In this case, the framework decides that it only needs 2 tasks to handle the load, so each task handles 25 topic-partitions. Which, is fine from the framework perspective since each partition's in-order guarantee is not violated. If you know more about your load pattern and want to explicitly have 1 unique task for each topic-partition, you can try to break up the config into separate configs, one for each topic and each having tasks.max of 2.
Keep in mind, this issue has little to no relation with HDFS, as both are a property of the upstream Apache project
I'd suggest closing and moving your questions to the Kafka users mailing list
I have 1 tasks and the topic has got 140 partitions. I see lot of consumer lag more that 20 hours. goes beyond our retention time.Hence losing data. how to find the tasks I need for the handling the 140 partitions.
One task can be per partition
Hi @OneCricketeer and @ncliang Can you please help me to solve the above problem which is "50 tasks for 25 topics having 2 partitions each"