Constant rebalancing
Description
We're facing a constant rebalance problem when using EventHub with Kafka interface. It doesn't even process one message at all.
How to reproduce
Our code uses this configuration file:
spring:
cloud:
stream:
default:
consumer:
partitioned: true
configuration:
max.poll.interval.ms: 300000
max.poll.records: 20
session.timeout.ms: 300000
heartbeat.timeout.ms: 100000
auto.offset.reset: earliest
request.timeout.ms: 60000
kafka:
binder:
brokers: ***
consumerProperties:
max.poll.interval.ms: 300000
max.poll.records: 20
session.timeout.ms: 300000
heartbeat.timeout.ms: 100000
auto.offset.reset: earliest
request.timeout.ms: 60000
configuration:
security:
protocol: SASL_SSL
sasl:
mechanism: PLAIN
jaas:
config: org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="***";
Our processing takes 2 seconds at normal.
Has it worked previously?
Previously we used this other configuration:
spring:
cloud:
stream:
kafka:
default:
consumer:
configuration:
max.poll.interval.ms: 3000000
max.poll.records: 20
binder:
brokers: ***
configuration:
security:
protocol: SASL_SSL
sasl:
mechanism: PLAIN
jaas:
config: org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString"
This works but it only enables 1 out our 5 nodes to process the message. The "selected" node runs fine, without any obvious problem, but it is not being balanced.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed or where adequate information has not been provided.
Please provide the relevant information for the following items:
- [x] SDK (include version info):
spring-cloud-stream 3.1.1 - [x] Sample you're having trouble with: N/A
- [x] If using Apache Kafka Java clients or a framework that uses Apache Kafka Java clients, version:
spring cloud stream 3.1.1 - [x] Kafka client configuration: at the top
- [x] Namespace and EventHub/topic name: perf-eventhub-namespace-hgu.servicebus.windows.net / device-twin-ip-changes
- [x] Consumer or producer failure: Consumer failure
- [x] Timestamps in UTC
- [x] group.id or client.id
$Default - [x] Logs provided (with debug-level logging enabled if possible, e.g. log4j.rootLogger=DEBUG) or exception call stack
- [x] Standalone repro Willing/able to send scenario to repro issue
- [x] Operating system: Docker Java 11 image from AdoptOpenJDK (Ubuntu)
- [x] Critical issue
If this is a question on basic functionality, please verify the following:
- [x] Port 9093 should not be blocked by firewall ("broker cannot be found" errors)
- [x] Pinging FQDN should return cluster DNS resolution (e.g.
$ ping namespace.servicebus.windows.netreturns ~ns-eh2-prod-am3-516.cloudapp.net [13.69.64.0]) - [x] Namespace should be either Standard or Dedicated tier, not Basic (TopicAuthorization errors)
Logs
https://gist.github.com/aarroyoc/8c17814ca3ee843ed94d5357d60e6a3c