takahe
takahe copied to clipboard
Refactor docker-compose to use a setup job
I'm not sure how much or even if you use the docker-compose code for development, but I noticed a couple things that have bit me on other projects so I made this PR to see if this is somewhere that I can help since I do a lot of Docker stuff these days.
First an explanation:
When running a Django project in a cluster-like deployment like Kubernetes, we found that the pattern of having a Docker image start with running migrations was a Bad Idea 'cause your deployment would spin up 5 web pods and all 5 would try to run the migrations at the same time leading to some rather scary/confusing results. On top of that, it reduces your pod start time, which sucks for scalability (horizontal pod autoscaling etc.)
The fix that made the most sense for us was instead to use a setup job. The same Docker image, but deployed with a different command argument that simply ran migrate and collectstatic and then exited with 0. That way, you could stand up 100 instances quickly, scaling up and down without any risk of running migrations for each one.
To replicate that experience in development, (which is the only place we use docker-compose), we have two "services": web and setup. The former runs the webserver until it's stopped, while the latter runs the setup and then dies happy.
If you agree that this is a good design, then 2 more things need to be changed:
- The
Dockerfileshould have itsCMDline removed as that's defined in the deployment anyway. Indocker-composewe use therunserverand for production, we'd usegunicorn takahe.wsgi:application -b 0.0.0.0:8000. - The
start.shscript can be deleted entirely.
Finally, adding an ENTRYPOINT to the Dockerfile that confirms the availability of the database before starting the webserver is probably a good idea. Without it, the server can start without an active connection and just rejects HTTP requests 'til you restart the service manually (ew). In the current setup, this doesn't happen because as the webserver starts up, it runs migrate which fails until the webserver is up. Something like this is probably sufficient:
#!/bin/bash
echo "Checking for a working connection to the database
nc -z ${PGHOST} 5432
...but you could do something more Djangoesqe if you like: /takahe/manage.py check_ready
Anyway, this is all meant as a "use this if you feel like it" PR. Doing my own development, I can always just use a docker-compose.override.yaml file and work any way I like, but since I was hoping to deploy Takahē to my local k3s cluster, I started hacking on it from that point of view.
Regardless of whether you use this PR or not, I hope you'll consider some other options for working around running migrate at the start of every webservice.
Yes, I agree with this - it was on my list of things to do as well (takahe.social runs in a small k8s cluster, and I need to move migrations to an initContainer). If you want to make the changes you suggested, I'll take them in.
(not really sure where to discuss this - apologies if this is the wrong place)
checking in on the commits from
https://github.com/andrewgodwin/takahe/commit/a43ccde8d99afb0cf58dec07e79dcc502ab790eb
I'm following the instructions in the readme and getting stuck
docker-web-1 | /usr/local/lib/python3.11/site-packages/whitenoise/base.py:115: UserWarning: No directory at: /takahe/static-collected/
docker-web-1 | warnings.warn(f"No directory at: {root}")
on this branch (refactor-docker-compose) I am getting an analogous error:
docker-setup-1 | Operations to perform:
docker-setup-1 | Apply all migrations: activities, admin, auth, contenttypes, core, sessions, stator, users
docker-setup-1 | Running migrations:
docker-setup-1 | No migrations to apply.
docker-setup-1 exited with code 0
docker-web-1 | /usr/local/lib/python3.11/site-packages/whitenoise/base.py:115: UserWarning: No directory at: /takahe/static-collected/
I'm able to run the app without issue at f8f4fa8665ef61caf5266c980046305f3b779c6d
That should be created by https://github.com/andrewgodwin/takahe/blob/main/docker/Dockerfile#L15 - can you poke inside your docker image and see what's there?
Ah wait, it's because the root got mounted. It should go away if you switch the compose file to use the "development" settings - maybe do that and rename it docker-compose-development.yaml so someone doesn't run it in prod?
worked - and seems obvious now that you point it out.
I've done this refactor separately on main with a few other tweaks, so closing this out!
@andrewgodwin Was your aforementioned refactoring meant to address the "production vs. development config being referenced in docker-compose" issue you mentioned in https://github.com/jointakahe/takahe/issues/14#issuecomment-1321237787? Because I can still reproduce that same issue (DJANGO_SETTINGS_MODULE must be manually set to takahe.settings.development for development with docker-compose.)
Should this be captured as a new issue, and/or at least mentioned in the Docker development instructions in CONTRIBUTING.md?
Forgot about that - just pushed that up now. I can't actually run docker-compose locally (I use podman for containers) so any testing of it is appreciated.
Confirmed working for development. 👍🏻 I guess I had imagined being able switch compose between the development/production settings at will rather than hardcoding to one or the other, but defaulting to development certainly makes sense for the time being!
Yeah, that compose file is not suitable for production (uses runserver) so I think it's fine just making it be dev-only. It probably needs a big DO NOT USE THIS IN PROD warning on it though.