libretime icon indicating copy to clipboard operation
libretime copied to clipboard

fix: container startup order in docker-compose.yml

Open olivermeyer opened this issue 1 year ago • 3 comments

Description

This PR fixes the startup order of containers in the Docker deployment.

The initial issue I was having was that the playout container sometimes timed out while trying to get the liquidsoap version from the liquidsoap container. Restarting the playout container fixed it, because by that time liquidsoap was ready, but it's less than ideal. To fix this, I suggest adding a healthcheck to liquidsoap, and adding it to the playout's dependencies with the service_healthy condition.

While looking into this, I noticed that logs are often noisy at startup because containers are using the default dependency mode (container_started) and ignoring healthchecks. To improve this, I'm adding a healthcheck to the api service and switching all dependencies to service_healthy, except for the dependency from legacy to nginx since legacy doesn't have a healthcheck.

The downside of this approach is that startup seems to take longer, because containers actually wait for upstream services to be running, rather than just waiting for them to start. I don't think this has an actual impact on startup time though, because downstream services would not work until upstream services are running anyway.

Testing Notes

What I did:

I applied these changes to my deployment a week ago and I've been monitoring it since. The issue with the playout container hasn't occurred since, and the logs are cleaner on startup.

How you can replicate my testing:

docker compose up and check that everything is running.

olivermeyer avatar May 14 '24 20:05 olivermeyer

Thanks a lot for the contribution, but I am unsure this is the fix for the initial problem. If the playout service has a timeout during the initialization phase, we should probably handle this problem first.

The services should be resilient to a missing dependencies (startup or crash), and have a built-in retry mechanism.

@olivermeyer If we fix the playout init problem, do you think the dependency chain is still worth adding ? Because it does add a lot of waiting, while the service should already handle everything gracefully.

jooola avatar Jun 07 '24 20:06 jooola

Thanks a lot for the contribution, but I am unsure this is the fix for the initial problem. If the playout service has a timeout during the initialization phase, we should probably handle this problem first.

The services should be resilient to a missing dependencies (startup or crash), and have a built-in retry mechanism.

Both solutions solve the issue, so it's really a matter of preference. I'm more familiar with Docker than I am with the playout service so I went that way, but if you would rather fix the playout service instead, that's fine by me. Perhaps the "best" (as in most robust) solution is to have both. For the sake of the argument: one benefit of handling this at the container level is that the health check is reusable, whereas handling the dependency in the playout service directly is not. But as long as the playout container can start, I'm happy.

@olivermeyer If we fix the playout init problem, do you think the dependency chain is still worth adding ? Because it does add a lot of waiting, while the service should already handle everything gracefully.

I'm curious about this. Did you compare startup times for Libretime as whole with and without the healthcheck? The playout service cannot start before the liquidsoap service is ready, so removing the healthcheck just so the container "starts" earlier (and then retries until liquidsoap is ready anyway) might appear faster if you're watching the containers start, but end up taking the same amount of time. I didn't take the time to do the comparison myself, but I'd speculate both solutions would be roughly equal.

TLDR: I'm happy either way :)

olivermeyer avatar Jun 10 '24 13:06 olivermeyer

Running libretime in a docker swarm environment has issues because nginx tries to resolve a container that does not exist before it starts, and so it caches in invalid DNS name, and the whole legacy app won't load until the nginx container is restarted. Docker swarm does not have the depends_on property and everything starts at once. Would this possibly fix that issue?

dakriy avatar Sep 30 '24 16:09 dakriy