kafka-connect-hdfs icon indicating copy to clipboard operation
kafka-connect-hdfs copied to clipboard

Not clear in docs relation between tasks.max and number of consumers attached to topic partition

Open abhisheksahani opened this issue 6 years ago • 5 comments

Hi we have 25 topics each topic having 2 partition , we have created connect config having topics.regex, so that connector consumes from all 25 topics with tasks.max set to 50 i.e(one unique consumer per partition) but when we describe the consumer group only two unique consumers are attached to 50 partition.

here's the config: { "name": "testConnectorfinalTest04", "config": { "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector", "tasks.max": "50", "hdfs.url": "hdfs://...:9000", "hadoop.conf.dir": "/home/tomcat/hadoop/etc/hadoop", "hadoop.home": "/home/tomcat/hadoop/", "hive.conf.dir": "/home/tomcat/hive/conf", "hive.home": "/home/hadoop/apache-hive-2.3.5-bin/", "flush.size": "1000", "format.class": "io.confluent.connect.hdfs.parquet.ParquetFormat", "hive.integration": "true", "hive.database": "testfinal04", "hive.metastore.uris": "thrift://...:9083", "schema.compatibility": "BACKWARD", "partitioner.class": "io.confluent.connect.storage.partitioner.FieldPartitionerWithTimePartition", "locale": "en", "topics.regex": "datapipelinefinaltest...topic", "partition.field.name": "tenant,groupid,project,name,year,month,day,hour", "partition.field.name.with.time": "systime", "timezone": "Asia/Calcutta", "rotate.schedule.interval.ms":"120000" } }

consumer describe output shows only two unique consumer get assigned to partitions : TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_23.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_12.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_16.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_20.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_18.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_11.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_22.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_2.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_25.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_8.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_10.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_5.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_14.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_9.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_1.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_21.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_3.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_13.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_6.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_24.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_7.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_4.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_19.topic 1 2499 2500 1 consumer-30-1162f4c6-4267-49bf-8b4c-d90ca80f2876 /... consumer-30 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_13.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_12.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_20.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_8.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_4.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_2.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_23.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_21.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_16.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_14.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_9.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_19.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_5.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_25.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_3.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_18.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_7.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_10.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_24.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_6.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_22.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_1.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29 datapipelinefinaltest.5da59e664cedfd00090d3757.dataPipeLineEvent_11.topic 0 2499 2500 1 consumer-29-2a82f7da-2692-4d9a-955c-ec8008c8e027 /... consumer-29

abhisheksahani avatar Oct 20 '19 21:10 abhisheksahani

tasks.max configuration comes from the framework and specifies the maximum number of tasks to be created for the connector. But, fewer tasks may be created. - https://docs.confluent.io/current/connect/managing/configuring.html

In this case, the framework decides that it only needs 2 tasks to handle the load, so each task handles 25 topic-partitions. Which, is fine from the framework perspective since each partition's in-order guarantee is not violated. If you know more about your load pattern and want to explicitly have 1 unique task for each topic-partition, you can try to break up the config into separate configs, one for each topic and each having tasks.max of 2.

ncliang avatar Nov 04 '19 05:11 ncliang

Keep in mind, this issue has little to no relation with HDFS, as both are a property of the upstream Apache project

I'd suggest closing and moving your questions to the Kafka users mailing list

OneCricketeer avatar May 02 '20 01:05 OneCricketeer

I have 1 tasks and the topic has got 140 partitions. I see lot of consumer lag more that 20 hours. goes beyond our retention time.Hence losing data. how to find the tasks I need for the handling the 140 partitions.

sxganapa avatar Aug 27 '20 18:08 sxganapa

One task can be per partition

OneCricketeer avatar Aug 27 '20 18:08 OneCricketeer

Hi @OneCricketeer and @ncliang Can you please help me to solve the above problem which is "50 tasks for 25 topics having 2 partitions each"

govinda-raj avatar Nov 03 '22 18:11 govinda-raj