jupyterhub-deploy-docker
jupyterhub-deploy-docker copied to clipboard
Docker in docker container design question
Still relatively new to Docker, and it's moving so fast it's a little hard to find up to date information on best practices / how something like running Docker inside a Docker container would be optimized by the containerization layer.
It seems like it's probably low overhead, but I can't find anything that explicitly says "this is a good design pattern and should be used when inter-container coordination is too difficult.
Curious what the pros and cons are to a setup like this: https://github.com/jupyterhub/oauthenticator/tree/master/example
Which after tweaking some old configuration, was able to get up and running with persistence on a volume like this -v hubdata:/home.
Need to do some more testing to see how well the isolation is handled, and implications of shared users. But, for running a hub server for a small team of devs, avoiding docker-ception seems to be sufficient and less complex - barring some performance considerations that I am missing.
Penny for your thoughts on the pros and cons of single docker container vs multiple docker containers?
Edit: Revisited the diagram, and it looks like the spawned containers are sitting at the same level as the hub itself. Trying to trace the need for:
# install docker on the jupyterhub container
RUN wget https://get.docker.com -q -O /tmp/getdocker && \
chmod +x /tmp/getdocker && \
sh /tmp/getdocker
In this reference implementation, JupyterHub components run in one container, and each individual user notebook server runs in a separate container, all on the same host. DockerSpawner spawns the user containers from within the JHub container, so the Docker client is installed in the JHub container to make that possible.
It is a generally accepted best practice to run one process per container. If I were to start a new JupyterHub deployment, I would use multiple containers, and leverage a container orchestration framework like Kubernetes, so I could scale it. The Jupyter community is doing work in this area with zero-to-jupyterhub on Kubernetes. I've also seen a JupyterHub deployment on Google Cloud Platform. Or, if you're an AWS shop, maybe k8s on AWS is for you.
Why Kubernetes? Because I use k8s to run other stuff. On the other hand, I know people who run JupyterHub in a completely Docker-free setup, on a single large machine, and it works just fine for them.
The point is, I don't think there's a correct answer on how to run JupyterHub, Docker or not. Use whatever setup is best for your situation.