schema-registry
schema-registry copied to clipboard
Windows WSL Docker NotCoordinatorException
Given a basic Docker Compose file with just Kafka and Schema Registry I observe the Schema Registry appears to connect fine initially to Kafka then apparently disconnects / re-discovers Kafka and attempts to re-join and fails many times before eventually becoming happy again. The log repeatedly displays:
JoinGroup failed: This is not the correct coordinator. Marking coordinator unknown.
and
Request joining group due to: rebalance failed due to 'This is not the correct coordinator.' (NotCoordinatorException)
Compose file:
services:
kafka:
image: bitnami/kafka:3.6.2
hostname: kafka
container_name: kafka
ports:
- "9094:9094"
environment:
- KAFKA_CFG_NODE_ID=0
- KAFKA_CFG_PROCESS_ROLES=controller,broker
- KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka:9093
- ALLOW_PLAINTEXT_LISTENER=yes
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093,EXTERNAL://:9094
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092,EXTERNAL://localhost:9094
- KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,EXTERNAL:PLAINTEXT,PLAINTEXT:PLAINTEXT
- KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
- KAFKA_CFG_INTER_BROKER_LISTENER_NAME=PLAINTEXT
registry:
image: bitnami/schema-registry:7.6.1
hostname: registry
container_name: registry
depends_on:
- kafka
ports:
- 8081:8081
environment:
- SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081
- SCHEMA_REGISTRY_KAFKA_BROKERS=PLAINTEXT://kafka:9092
Docker logs: docker-logs.txt
The problem is likely an environment specific one, as it only occurs on a specific machine and no others, but I pose the question here to see if anyone has any clues to what might be happening. Someone more familiar with reading the log output may have some tips. The Compose works in many environment including on Red Hat Linux, MacOS, and Windows 11 Home Edition, but the odd behavior is observed on Windows 11 Pro on an enterprise network. I am using latest Docker Desktop (v4.31.1) with fully patched Windows 11 and WSL2.
The same warnings are printed many times so scroll to the end of the log file to see the eventual recovery. My best guess is maybe an antivirus tool is interfering somehow. Or maybe stale metadata is being used on startup for some reason, despite use of docker compose down
between attempts. It appears perhaps Raft re-election must occur before the Registry is finally happy. Odd. Please offer any troubleshooting tips you may have. Thanks!
Possibly related to: https://github.com/JeffersonLab/wildfly/issues/4 in which it was determined that there is a bug with Windows DNS that generally is only exposed on corporate networks (zones authoritative for upstream servers) due to DNS answers section erroneously including authority info. I've tried both WSL dnsTunneling and explicitly specifying a corporate DNS server to work around this and can successfully configure WSL to correctly resolve external hosts, but this Kafka/Registry issue persists (though transiently - rebooting usually fixes it for one successful go, suggesting there is caching issue somewhere).