compose icon indicating copy to clipboard operation
compose copied to clipboard

[BUG] `Application failed to start after update` when docker compose watch tries to build multiple dependent services simultaneously

Open 8ma10s opened this issue 2 years ago • 4 comments

Description

I have a compose file with following properties:

  • multiple services
  • some of the services share the same Dockerfile, and is using same rebuild condition (when one file changes, all containers will be rebuilt)
  • dependencies on each container

sample compose file note: it probably is still reproducible without redis and localstack, but including them just in case.

  services:
    redis:
      image: redis:5.0.8
      ports:
        - "6379"
    rails-server:
      build: &rails_build
        target: local
      develop: &rails_develop
        watch:
          - action: sync
            path: .
            target: /app
          - action: rebuild
            path: Gemfile.lock
      command: /bin/bash -c "./bin/rails server -b 0.0.0.0"
      depends_on:
        redis:
          condition: service_started
        localstack:
          condition: service_healthy
        worker-alpha:
          condition: service_started
        worker-beta:
          condition: service_started
    worker-alpha:
      build: *rails_build
      develop: *rails_develop
      command: /bin/bash -c "bundle exec aws_sqs_active_job --queue worker_alpha_queue 1"
      depends_on:
        redis:
          condition: service_started
        localstack:
          condition: service_healthy
      tty: true
      stdin_open: true
    worker-beta:
      build: *rails_build
      develop: *rails_develop
      command: /bin/bash -c "bundle exec aws_sqs_active_job --queue worker_beta_queue 1"
      depends_on:
        redis:
          condition: service_started
        localstack:
          condition: service_healthy
    localstack:
      image: localstack/localstack
      healthcheck:
        test: curl -s http://curl -s http://localhost:4566/_localstack/init/ready | grep '"completed":\ true'
        interval: 5s
        start_period: 30s
        timeout: 1s
  • rails-server, worker-alpha, and worker-beta share the same Dockerfile and rebuild condition
  • rails-server depends on worker-alpha, worker-beta, redis, and localstack
  • worker-alpha and worker-beta depends on redis and localstack

In this case, 3 services rails-server, worker-1 and worker-2 will be rebuilt when I change Gemfile.lock. However, rebuild sequence fails with the following message:

 ✔ Container project-redis-1                 Running                                                                                                                                                                           0.0s 
 ✔ Container project-localstack-1            Running                                                                                                                                                                           0.0s 
 ✔ Container project-worker-alpha-1   Running                                                                                                                                                                           0.0s 
[+] Running 3/3ct-worker-beta-1  Running                                                                                                                                                                           0.0s 
 ✔ Container project-redis-1                Running                                                                                                                                                                            0.0s 
 ✔ Container project-localstack-1           Running                                                                                                                                                                            0.0s 
 ✔ Container project-worker-alpha-1  Recreated                                                                                                                                                                          0.2s 
 ⠋ Container f953c01259a6_project-rails-1   Recreate                                                                                                                                                                           0.0s 
Application failed to start after update

From what I see on the logs and terminal output, it looks as if each service's rebuild sequence triggers build, shutdown, and launch individually. Therefore, if 3 services were to be rebuilt at the same time, each service's rebuild sequence will individually trigger:

  • rebuild of each dependent service
  • shutdown of the service itself and all dependent services
  • launch of service itself and all dependent services

All in random order, resulting in conflicting containers and shutdown of unexpected services.

I thought it may be related to https://github.com/docker/compose/issues/10863, but it still happened with compose version 2.23.0.


I think, ideally, docker compose should do the following in this case:

  • determine which service needs to be rebuilt, and create a list of services to be rebuilt
  • rebuild each service (only once per service)
  • shut down and relaunch (again, only once per service)

Steps To Reproduce

  1. create a docker compose file such that rebuild of multiple services are triggered simultaneously, and those rebuilt services depend on each other
  2. trigger rebuild by changing the target file

Compose Version

% docker-compose version
Docker Compose version 2.23.0

Docker Environment

info says docker compose version 2.22.0, but I use `docker-compose watch`,not `docker compose watch` to trigger `watch`, as stated above.


Client: Docker Engine - Community
 Version:    24.0.7
 Context:    orbstack
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.1
    Path:     /Users/sasaki.yamato/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.22.0
    Path:     /Users/sasaki.yamato/.docker/cli-plugins/docker-compose

Server:
 Containers: 12
  Running: 2
  Paused: 0
  Stopped: 10
 Images: 48
 Server Version: 24.0.7
 Storage Driver: overlay2
  Backing Filesystem: btrfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8c087663b0233f6e6e2f4515cee61d49f14746a8
 runc version: 82f18fe0e44a59034f3e1f45e475fa5636e539aa
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.5.9-orbstack-00110-gce15a5dc65fa
 Operating System: OrbStack
 OSType: linux
 Architecture: aarch64
 CPUs: 10
 Total Memory: 7.748GiB
 Name: orbstack
 ID: 3d86a60b-3135-4e50-b9bf-63cd0e342ab1
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine
 Default Address Pools:
   Base: 192.168.215.0/24, Size: 24
   Base: 192.168.228.0/24, Size: 24
   Base: 192.168.247.0/24, Size: 24
   Base: 192.168.207.0/24, Size: 24
   Base: 192.168.167.0/24, Size: 24
   Base: 192.168.107.0/24, Size: 24
   Base: 192.168.237.0/24, Size: 24
   Base: 192.168.148.0/24, Size: 24
   Base: 192.168.214.0/24, Size: 24
   Base: 192.168.165.0/24, Size: 24
   Base: 192.168.227.0/24, Size: 24
   Base: 192.168.181.0/24, Size: 24
   Base: 192.168.158.0/24, Size: 24
   Base: 192.168.117.0/24, Size: 24
   Base: 192.168.155.0/24, Size: 24
   Base: 192.168.147.0/24, Size: 24
   Base: 192.168.229.0/24, Size: 24
   Base: 192.168.183.0/24, Size: 24
   Base: 192.168.156.0/24, Size: 24
   Base: 192.168.97.0/24, Size: 24
   Base: 192.168.171.0/24, Size: 24
   Base: 192.168.186.0/24, Size: 24
   Base: 192.168.216.0/24, Size: 24
   Base: 192.168.242.0/24, Size: 24
   Base: 192.168.166.0/24, Size: 24
   Base: 192.168.239.0/24, Size: 24
   Base: 192.168.223.0/24, Size: 24
   Base: 192.168.164.0/24, Size: 24
   Base: 192.168.163.0/24, Size: 24
   Base: 192.168.172.0/24, Size: 24
   Base: 172.17.0.0/16, Size: 16
   Base: 172.18.0.0/16, Size: 16
   Base: 172.19.0.0/16, Size: 16
   Base: 172.20.0.0/14, Size: 16
   Base: 172.24.0.0/14, Size: 16
   Base: 172.28.0.0/14, Size: 16


### Anything else?

_No response_

8ma10s avatar Nov 15 '23 01:11 8ma10s

This may be related: https://github.com/docker/compose/issues/11079

8ma10s avatar Nov 20 '23 09:11 8ma10s

Hello @8ma10s Compose watch wasn't designed for this kind of use cases, it expects, at least, to managed different paths for each service so it can apply a dedicated strategy. But maybe you could try something, as your using the same build and develop configuration for your 3 services. You could only declare the develop section to the rails-server service and add a restart:true configuration in the depends_on for both worker-alpha and worker-beta. I don't guaranteed it will recreate the 2 worker services but at least you should give it a try.

glours avatar Nov 21 '23 10:11 glours

@glours Thanks for your suggestion. I tried the setting you suggested, and in conclusion, it doesn't quite do what I want.

  • it looks like doing so will restart worker-alpha and worker-beta
  • however, because we declare worker-alpha and worker-beta without watch -> rebuild statement, it simply recreates the container, but doesn't rebuild the image.

Hopefully, we can somehow configure docker-compose so that it will rebuild all the images (without launch conflicts), but it seems like docker compose currently does not provide a way to do so.

8ma10s avatar Dec 07 '23 03:12 8ma10s

bumped into the same issue. Although the use case is slightly different. I have 3 distinct services that shares a same image but launch with different commands.

Louis-Tian avatar Jan 17 '24 03:01 Louis-Tian

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jul 24 '24 00:07 github-actions[bot]