Flink parallelism causing uneven TaskManager utilization
I am deploying a scenario from Nussknacker to Flink where the source is Kafka and the sink sends requests to an HTTP endpoint. The setup has 2 Kafka broker nodes and 2 TaskManagers with one TM have 6 slots and the other have 5 slots
I ran the job with different parallelism configurations, and I noticed unexpected load distribution across TaskManagers:
- Case 1:
Job Parallelism = 5
Observation: Only one TaskManager slot is being utilized, and the entire load runs on a single TaskManager.
- Case 2:
Job Parallelism = 8
Observation: Two TaskManager slots are being utilized, but still the entire load is handled by a single TaskManager.
- Case 3:
Observation: Again, only two TaskManager slots are being utilized, and the load is concentrated on a single TaskManager.
However, when I set job parallelism = 6 with 3 slots in each Task Managers, the load distributes properly across both TaskManagers.
Question: Why is the load not evenly distributed across TaskManagers? Is this related to the number of Kafka partitions, operator chaining in Flink/Nussknacker, or some scheduling limitation? What’s the recommended configuration to ensure even distribution and better throughput in this setup?
It is a typical Flink's behaviour for the default configuration. Try to switch this https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#taskmanager-load-balance-mode configuration option to SLOTS. The old name of this configuration option was cluster.evenly-spread-out-slots.
Is this related to the number of Kafka partitions, operator chaining in Flink/Nussknacker, or some scheduling limitation? What’s the recommended configuration to ensure even distribution and better throughput in this setup?
The number of Kafka partitions is important. It should be at least the same as the configured scenario parallelism to distribute the job properly.
Other things to check:
- Is the data distributed evenly across the Kafka partitions
- Is a
keyused in the stateful stream processing picked correctly