docker-images
docker-images copied to clipboard
Can't get stack working with docker network
I'm trying to create a docker-compose
file to bring up a dev stack on docker 1.9.1, which has the new network support in favour of which --link
has been deprecated.
Ignoring the deprecation and using this docker-compose.yml
with links
works fine:
zookeeper:
image: confluent/zookeeper
ports:
- "2181:2181"
kafka:
image: confluent/kafka
ports:
- "9092:9092"
links:
- zookeeper
schema-registry:
image: confluent/schema-registry
ports:
- "8081:8081"
links:
- zookeeper
- kafka
rest-proxy:
image: confluent/rest-proxy
ports:
- "8082:8082"
links:
- zookeeper
- kafka
- schema-registry
But when I try to use a docker 1.9 network bridge (called confluent
) with this docker-compose.yml
, the Schema Registry won't start:
zookeeper:
image: confluent/zookeeper
container_name: zookeeper
net: confluent
ports:
- 2181:2181
kafka:
image: confluent/kafka
container_name: kafka
net: confluent
ports:
- 9092:9092
environment:
# Use zookeeper:2181 which is defined in container's /etc/hosts
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
# The kafka hostname should point to this container in the other containers' /etc/hosts
KAFKA_ADVERTISED_HOST_NAME: kafka
schema-registry:
image: confluent/schema-registry
container_name: registry
net: confluent
ports:
- 8081:8081
environment:
SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL: zookeeper:2181
rest-proxy:
image: confluent/rest-proxy
container_name: rest-proxy
net: confluent
ports:
- 8082:8082
environment:
RP_SCHEMA_REGISTRY_URL: registry:8081
RP_ZOOKEEPER_CONNECT: zookeeper:2181
The logs from the registry container look like this:
Starting registry
Attaching to registry
registry | SLF4J: Class path contains multiple SLF4J bindings.
registry | SLF4J: Found binding in [jar:file:/usr/share/java/confluent-common/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
registry | SLF4J: Found binding in [jar:file:/usr/share/java/schema-registry/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
registry | SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
registry | SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
registry | [2015-12-18 17:04:48,636] INFO SchemaRegistryConfig values:
registry | master.eligibility = true
registry | port = 8081
registry | kafkastore.timeout.ms = 500
registry | kafkastore.init.timeout.ms = 60000
registry | debug = false
registry | kafkastore.zk.session.timeout.ms = 30000
registry | request.logger.name = io.confluent.rest-utils.requests
registry | metrics.sample.window.ms = 30000
registry | schema.registry.zk.namespace = schema_registry
registry | kafkastore.topic = _schemas
registry | avro.compatibility.level = backward
registry | shutdown.graceful.ms = 1000
registry | response.mediatype.preferred = [application/vnd.schemaregistry.v1+json, application/vnd.schemaregistry+json, application/json]
registry | metrics.jmx.prefix = kafka.schema.registry
registry | host.name = 2ea69d1754e9
registry | metric.reporters = []
registry | kafkastore.commit.interval.ms = -1
registry | kafkastore.connection.url = zookeeper:2181
registry | metrics.num.samples = 2
registry | response.mediatype.default = application/vnd.schemaregistry.v1+json
registry | kafkastore.topic.replication.factor = 3
registry | (io.confluent.kafka.schemaregistry.rest.SchemaRegistryConfig:135)
registry | [2015-12-18 17:04:49,152] INFO Initialized the consumer offset to -1 (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread:87)
registry | [2015-12-18 17:04:49,807] WARN The replication factor of the schema topic _schemas is less than the desired one of 3. If this is a production environment, it's crucial to add more brokers and increase the replication factor of the topic. (io.confluent.kafka.schemaregistry.storage.KafkaStore:201)
registry | [2015-12-18 17:04:49,886] INFO [kafka-store-reader-thread-_schemas], Starting (io.confluent.kafka.schemaregistry.storage.KafkaStoreReaderThread:68)
registry | [2015-12-18 17:05:49,967] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication:57)
registry | io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryInitializationException: Error initializing kafka store while initializing schema registry
registry | at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:164)
registry | at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:55)
registry | at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.setupResources(SchemaRegistryRestApplication.java:37)
registry | at io.confluent.rest.Application.createServer(Application.java:104)
registry | at io.confluent.kafka.schemaregistry.rest.Main.main(Main.java:42)
registry | Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreInitializationException: io.confluent.kafka.schemaregistry.storage.exceptions.StoreException: Failed to write Noop record to kafka store.
registry | at io.confluent.kafka.schemaregistry.storage.KafkaStore.init(KafkaStore.java:151)
registry | at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.init(KafkaSchemaRegistry.java:162)
registry | ... 4 more
registry | Caused by: io.confluent.kafka.schemaregistry.storage.exceptions.StoreException: Failed to write Noop record to kafka store.
registry | at io.confluent.kafka.schemaregistry.storage.KafkaStore.getLatestOffset(KafkaStore.java:363)
registry | at io.confluent.kafka.schemaregistry.storage.KafkaStore.waitUntilKafkaReaderReachesLastOffset(KafkaStore.java:220)
registry | at io.confluent.kafka.schemaregistry.storage.KafkaStore.init(KafkaStore.java:149)
registry | ... 5 more
registry exited with code 1
And while registry is trying to start, the kafka
container logs this a bunch of times:
kafka | [2015-12-18 17:05:50,344] INFO Closing socket connection to /172.19.0.2. (kafka.network.Processor)
(I checked, and 172.19.0.2
was the IP address of the registry
container.)
And then after it fails, the zookeeper
container logs this:
zookeeper | [2015-12-18 17:05:50,344] WARN caught end of stream exception (org.apache.zookeeper.server.NIOServerCnxn)
zookeeper | EndOfStreamException: Unable to read additional data from client sessionid 0x151b60caa6b0004, likely client has closed socket
zookeeper | at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
zookeeper | at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
zookeeper | at java.lang.Thread.run(Thread.java:745)
zookeeper | [2015-12-18 17:05:50,348] INFO Closed socket connection for client /172.19.0.2:38991 which had sessionid 0x151b60caa6b0004 (org.apache.zookeeper.server.NIOServerCnxn)
zookeeper | [2015-12-18 17:05:50,349] WARN caught end of stream exception (org.apache.zookeeper.server.NIOServerCnxn)
zookeeper | EndOfStreamException: Unable to read additional data from client sessionid 0x151b60caa6b0005, likely client has closed socket
zookeeper | at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
zookeeper | at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
zookeeper | at java.lang.Thread.run(Thread.java:745)
zookeeper | [2015-12-18 17:05:50,350] INFO Closed socket connection for client /172.19.0.2:38992 which had sessionid 0x151b60caa6b0005 (org.apache.zookeeper.server.NIOServerCnxn)
zookeeper | [2015-12-18 17:05:56,000] INFO Expiring session 0x151b60caa6b0005, timeout of 6000ms exceeded (org.apache.zookeeper.server.ZooKeeperServer)
zookeeper | [2015-12-18 17:05:56,001] INFO Processed session termination for sessionid: 0x151b60caa6b0005 (org.apache.zookeeper.server.PrepRequestProcessor)
zookeeper | [2015-12-18 17:06:20,000] INFO Expiring session 0x151b60caa6b0004, timeout of 30000ms exceeded (org.apache.zookeeper.server.ZooKeeperServer)
zookeeper | [2015-12-18 17:06:20,000] INFO Processed session termination for sessionid: 0x151b60caa6b0004 (org.apache.zookeeper.server.PrepRequestProcessor)
I'm OK with using links
for now, but it would be nice to get this resolved.
Addendum: I Googled Failed to write Noop record to kafka store
and found some issues on the schema-registry
repo which mentioned it, but none of the workarounds there made any difference in this case.
@markrendle This looks like a limitation in docker-compose. See https://docs.docker.com/compose/faq/#how-do-i-get-compose-to-wait-for-my-database-to-be-ready-before-starting-my-application What's happening (based on how I saw it run locally) is that the schema-registry is being started up before other services (as well as other services starting out of order, e.g. the REST proxy image was pulled and the container started first).
Unfortunately, docker-compose seems to assume that it exists in a vacuum and that there wouldn't be something else in production (e.g. init daemons, YARN/Mesos/similar) that would manage the healthchecks, restarting failed processes, etc., and in turn assumes that you don't ever want to use a process exiting with a non-zero exit code as a useful indication of errors... The version with links works because it correctly defines the dependencies and for historical reasons docker-compose has to respect them.
To make this work reliably in docker-compose, I think we'd need to run everything in the docker-images under an init daemon that is configured to always restart processes that die to get the behavior that docker-compose seems to assume.
That said, even extending the timeouts by setting
SCHEMA_REGISTRY_KAFKASTORE_INIT_TIMEOUT_MS: 300000
SCHEMA_REGISTRY_KAFKASTORE_TIMEOUT_MS: 300000
under the schema-registry
entry in the docker-compose.yml
does not fix the problem. It looks like the Kafka broker startup might be getting interrupted somehow with a shutdown request (although the controller resignation is a bit confusing as there is only a single Kafka broker):
kafka | [2015-12-22 05:23:56,009] ERROR Error handling event ZkEvent[Data of /controller changed sent to kafka.server.ZookeeperLeaderElector$LeaderChangeListener@7e768ee6] (org.I0Itec.zkclient.ZkEventThread)
kafka | java.lang.IllegalStateException: Kafka scheduler has not been started
kafka | at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114)
kafka | at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86)
kafka | at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350)
kafka | at kafka.controller.KafkaController$$anonfun$2.apply$mcV$sp(KafkaController.scala:162)
kafka | at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:138)
kafka | at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134)
kafka | at kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:134)
kafka | at kafka.utils.Utils$.inLock(Utils.scala:535)
kafka | at kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:134)
kafka | at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:549)
kafka | at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
kafka | [2015-12-22 05:23:56,063] ERROR [KafkaApi-0] error when handling request Name: TopicMetadataRequest; Version: 0; CorrelationId: 48; ClientId: producer-1; Topics: _schemas (kafka.server.KafkaApis)
kafka | kafka.admin.AdminOperationException: replication factor: 1 larger than available brokers: 0
kafka | at kafka.admin.AdminUtils$.assignReplicasToBrokers(AdminUtils.scala:70)
kafka | at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:171)
kafka | at kafka.server.KafkaApis$$anonfun$19.apply(KafkaApis.scala:520)
kafka | at kafka.server.KafkaApis$$anonfun$19.apply(KafkaApis.scala:503)
kafka | at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
kafka | at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
kafka | at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
kafka | at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
kafka | at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
kafka | at scala.collection.SetLike$class.map(SetLike.scala:93)
kafka | at scala.collection.AbstractSet.map(Set.scala:47)
kafka | at kafka.server.KafkaApis.getTopicMetadata(KafkaApis.scala:503)
kafka | at kafka.server.KafkaApis.handleTopicMetadataRequest(KafkaApis.scala:542)
kafka | at kafka.server.KafkaApis.handle(KafkaApis.scala:62)
kafka | at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
kafka | at java.lang.Thread.run(Thread.java:745)
I don't have any concrete next steps to suggest yet aside from adjusting the different services to wait long enough for dependent services to start up (which I don't know will work reliably if they have to pull the images too...), but I figured I'd at least document some initial findings.