confluent-kafka-dotnet
confluent-kafka-dotnet copied to clipboard
MSK connectivity issue during AWS Security Patch Updates
Description
Facing issues while consuming event from Kafka using AWS MSK during security patch updates.
How to reproduce
- Launch an Consumer application using AWS MSK as Kafka infrastructure.
- Wait for roll out MSK updates or applying security patch automatically or apply manually if possible
Additional Details
On further observation while debugging Error.Code returned as Local_Transport
Checklist
Program: Basic Consumer application (Regularly consume events) Confluent.Kafka nuget version: 2.2.0 Apache Kafka version: 2.8.1 Client configuration: EnableAutoCommit = false; EnableAutoOffsetStore = false;
Info Logs:
ssl://b-1.devmsk.<unique-id-1>.c17.kafka.ap-south-1.amazonaws.com:9094/1: Connect to ipv4#<ip-address-1>:9094 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
[thrd:ssl://b-1.devmsk.<unique-id-1>.c17.kafka.ap-south-1.amazonaw]: ssl://b-1.devmsk.<unique-id-1>.c17.kafka.ap-south-1.amazonaws.com:9094/1: Connect to ipv4#<ip-address-1>:9094 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
2/2 brokers are down
ssl://b-1.devmsk.<unique-id-1>.c17.kafka.ap-south-1.amazonaws.com:9094/1: Disconnected: verify that security.protocol is correctly configured, broker might require SASL authentication (after -1616061376ms in state UP)
GroupCoordinator: b-2.devmsk.<unique-id-1>.c17.kafka.ap-south-1.amazonaws.com:9094: Connect to ipv4#<ip-address-3>:9094 failed: Connection refused (after 1ms in state CONNECT, 1 identical error(s) suppressed)
[thrd:GroupCoordinator]: GroupCoordinator: b-2.devmsk.<unique-id-1>.c17.kafka.ap-south-1.amazonaws.com:9094: Connect to ipv4#<ip-address-3>:9094 failed: Connection refused (after 1ms in state CONNECT, 1 identical error(s) suppressed)
ssl://b-2.devmsk.<unique-id-1>.c17.kafka.ap-south-1.amazonaws.com:9094/2: Connect to ipv4#<ip-address-3>:9094 failed: Connection refused (after 1ms in state CONNECT, 1 identical error(s) suppressed)
[thrd:ssl://b-2.devmsk.<unique-id-1>.c17.kafka.ap-south-1.amazonaw]: ssl://b-2.devmsk.<unique-id-1>.c17.kafka.ap-south-1.amazonaws.com:9094/2: Connect to ipv4#<ip-address-3>:9094 failed: Connection refused (after 1ms in state CONNECT, 1 identical error(s) suppressed)
Please provide the following information:
- [x] #2191
- [x] Confluent.Kafka nuget version.
- [x] Apache Kafka version.
- [x] Client configuration.
- [ ] Operating system.
- [x] Provide logs (with "debug" : "..." as necessary in configuration).
- [ ] Provide broker log excerpts.
- [ ] Critical issue.
Similar unanswered Issues:
- https://github.com/confluentinc/librdkafka/issues/3569
- https://github.com/confluentinc/librdkafka/discussions/4188
Any updates on this? I am facing same issue during security patching for MSK cluster.
Was the broker reachable? Can you provide more logs?
On further enhancing logs by adding remaining properties for SetLogHandler() and SetErrorHandler() implementation found:
- For SetErrorHandler(), each "Error" object have 'Code' as
Local_Transport or Local_AllBrokersDownand Reason (as shared above). - For SetLogHandler(), each "LogMessage" object have 'Name' as
rdkafka#consumer-1 or rdkafka#producer-1, 'Facility' asFAILand Message (as shared above).
NOTE: All the above logs are produced with loglevel as either of Info/Warning/Error. Nothing else is produced even after enabling Debug loglevel
@anchitj Getting same error after MSK patching activity and after this Kafka client(Consumer code) not able to connect again, only option is to restart the pods(service). Kindly let me know of there is any way to configure consumer code to re-initiates the connection
%4|1710193417.943|FAIL|rdkafka#consumer-9886| [thrd:sasl_ssl://brokeraddress.amazonaws.com]: sasl_ssl://b-3.msk-wbrokeraddress.amazonaws.com:9096/3: Connection setup timed out in state APIVERSION_QUERY (after 29924ms in state APIVERSION_QUERY, 1 identical error(s) suppressed) %4|1710193447.946|FAIL|rdkafka#consumer-9886| [thrd:sasl_ssl://b-2.brokeraddress.amazonaws.com]: sasl_ssl://b-2.msk-wbroker.amazonaws.com:9096/2: Connection setup timed out in state APIVERSION_QUERY (after 29912ms in state APIVERSION_QUERY, 1 identical error(s) suppressed)
Client should keep retrying on its own and this error should be transient. Please try to reproduce once again with Debug="all" and upload the logs here.