takahe Refactor docker-compose to use a setup job

I'm not sure how much or even if you use the docker-compose code for development, but I noticed a couple things that have bit me on other projects so I made this PR to see if this is somewhere that I can help since I do a lot of Docker stuff these days.

First an explanation:

When running a Django project in a cluster-like deployment like Kubernetes, we found that the pattern of having a Docker image start with running migrations was a Bad Idea 'cause your deployment would spin up 5 web pods and all 5 would try to run the migrations at the same time leading to some rather scary/confusing results. On top of that, it reduces your pod start time, which sucks for scalability (horizontal pod autoscaling etc.)

The fix that made the most sense for us was instead to use a setup job. The same Docker image, but deployed with a different command argument that simply ran migrate and collectstatic and then exited with 0. That way, you could stand up 100 instances quickly, scaling up and down without any risk of running migrations for each one.

To replicate that experience in development, (which is the only place we use docker-compose), we have two "services": web and setup. The former runs the webserver until it's stopped, while the latter runs the setup and then dies happy.

If you agree that this is a good design, then 2 more things need to be changed:

The Dockerfile should have its CMD line removed as that's defined in the deployment anyway. In docker-compose we use the runserver and for production, we'd use gunicorn takahe.wsgi:application -b 0.0.0.0:8000.
The start.sh script can be deleted entirely.

Finally, adding an ENTRYPOINT to the Dockerfile that confirms the availability of the database before starting the webserver is probably a good idea. Without it, the server can start without an active connection and just rejects HTTP requests 'til you restart the service manually (ew). In the current setup, this doesn't happen because as the webserver starts up, it runs migrate which fails until the webserver is up. Something like this is probably sufficient:

#!/bin/bash

echo "Checking for a working connection to the database
nc -z ${PGHOST} 5432

...but you could do something more Djangoesqe if you like: /takahe/manage.py check_ready

Anyway, this is all meant as a "use this if you feel like it" PR. Doing my own development, I can always just use a docker-compose.override.yaml file and work any way I like, but since I was hoping to deploy Takahē to my local k3s cluster, I started hacking on it from that point of view.

Regardless of whether you use this PR or not, I hope you'll consider some other options for working around running migrate at the start of every webservice.

Nov 20 '22 12:11 danielquinn

Yes, I agree with this - it was on my list of things to do as well (takahe.social runs in a small k8s cluster, and I need to move migrations to an initContainer). If you want to make the changes you suggested, I'll take them in.

Nov 20 '22 18:11 andrewgodwin

(not really sure where to discuss this - apologies if this is the wrong place)

checking in on the commits from

https://github.com/andrewgodwin/takahe/commit/a43ccde8d99afb0cf58dec07e79dcc502ab790eb

I'm following the instructions in the readme and getting stuck

docker-web-1  | /usr/local/lib/python3.11/site-packages/whitenoise/base.py:115: UserWarning: No directory at: /takahe/static-collected/
docker-web-1  |   warnings.warn(f"No directory at: {root}")

on this branch (refactor-docker-compose) I am getting an analogous error:

docker-setup-1  | Operations to perform:
docker-setup-1  |   Apply all migrations: activities, admin, auth, contenttypes, core, sessions, stator, users
docker-setup-1  | Running migrations:
docker-setup-1  |   No migrations to apply.
docker-setup-1 exited with code 0
docker-web-1    | /usr/local/lib/python3.11/site-packages/whitenoise/base.py:115: UserWarning: No directory at: /takahe/static-collected/

I'm able to run the app without issue at f8f4fa8665ef61caf5266c980046305f3b779c6d

Nov 20 '22 20:11 whatSocks

That should be created by https://github.com/andrewgodwin/takahe/blob/main/docker/Dockerfile#L15 - can you poke inside your docker image and see what's there?

Nov 20 '22 20:11 andrewgodwin

Ah wait, it's because the root got mounted. It should go away if you switch the compose file to use the "development" settings - maybe do that and rename it docker-compose-development.yaml so someone doesn't run it in prod?

Nov 20 '22 20:11 andrewgodwin

worked - and seems obvious now that you point it out.

Nov 20 '22 20:11 whatSocks

I've done this refactor separately on main with a few other tweaks, so closing this out!

Nov 23 '22 04:11 andrewgodwin

@andrewgodwin Was your aforementioned refactoring meant to address the "production vs. development config being referenced in docker-compose" issue you mentioned in https://github.com/jointakahe/takahe/issues/14#issuecomment-1321237787? Because I can still reproduce that same issue (DJANGO_SETTINGS_MODULE must be manually set to takahe.settings.development for development with docker-compose.)

Should this be captured as a new issue, and/or at least mentioned in the Docker development instructions in CONTRIBUTING.md?

Nov 23 '22 06:11 joshdick

Forgot about that - just pushed that up now. I can't actually run docker-compose locally (I use podman for containers) so any testing of it is appreciated.

Nov 23 '22 07:11 andrewgodwin

Confirmed working for development. 👍🏻 I guess I had imagined being able switch compose between the development/production settings at will rather than hardcoding to one or the other, but defaulting to development certainly makes sense for the time being!

Nov 23 '22 07:11 joshdick

Yeah, that compose file is not suitable for production (uses runserver) so I think it's fine just making it be dev-only. It probably needs a big DO NOT USE THIS IN PROD warning on it though.

Nov 23 '22 18:11 andrewgodwin

takahe takahe copied to clipboard

Refactor docker-compose to use a setup job

takahe
takahe copied to clipboard