binderhub icon indicating copy to clipboard operation
binderhub copied to clipboard

Using containerd (and not Docker) with BinderHub

Open rkevin-arch opened this issue 5 years ago • 11 comments

Docker is deprecated as a container runtime in Kubernetes 1.20. For those who are switching to a different container runtime, we need a different solution for running repo2docker.

There are some documentation on using Docker-in-Docker, but from what I can tell it doesn't work with containerd as the container runtime (at least, not with the current helm chart, which mounts dockersocket-host into the dind pod).

I've tested on Kubernetes 1.20.0 with containerd 1.4.3 and no Docker, and either the build pod (without DinD) or the DinD pod (with DinD) are not ready because Kubernetes can't mount /var/run/docker.sock into the pod.

Is there a particular reason why DinD needs access to the host docker.sock? Can DinD run in containerd instead of Docker? I can do some testing if needed.

rkevin-arch avatar Dec 12 '20 12:12 rkevin-arch

I wouldn't call this an enhancement, but I accidentally opened the issue with the label and now I can't remove it. Oh well.

rkevin-arch avatar Dec 12 '20 12:12 rkevin-arch

There are two main components in BinderHub:

  • The JupyterHub Helm chart which runs on a standard Kubernetes cluster
  • repo2docker which uses Docker for building images, so it currently requires access to the host Docker Daemon (Docker-in-Docker is just a way to expose the host's Docker daemon inside a Docker container, it's not full virtualisation). There's some work on making repo2docker agnostic to the container engine: https://github.com/jupyterhub/repo2docker/pull/848, any feedback or suggestions are welcome!

manics avatar Dec 12 '20 14:12 manics

Sorry, the pod that didn't work was actually not dind, it was the image cleaner. DinD seems to sort of work in containerd (the build was able to start), but there seems to be no network connectivity in the image being built. I'll investigate a bit later. It would be nice to make repo2docker build-system-agnostic, but I'm not sure how hard that would be (or what kind of non-docker-based container build systems are out there)

rkevin-arch avatar Dec 12 '20 14:12 rkevin-arch

@rkevin-arch :heart: :tada: thank you for exploring what the issues coming from k8s deprecating docker as a CRI!

consideRatio avatar Dec 12 '20 15:12 consideRatio

Update 3: The tl;dr for the following is depending on your networking, you should set a smaller MTU for the dind docker daemon. I did the following and it solved the issue:

dind:
  enabled: true
  daemonset:
    image:
      name: docker
      tag: 19.03.14-dind
    extraArgs:
      - --mtu
      - "1400"

See also: this.


Original comment:

Experiencing some really really weird stuff in the build container. I was able to manually run commands in the docker container being built by using kubectl exec to get a shell in the dind container, then using docker -H /run/dind/docker.sock to find the container currently being built and exec into it. Both networking and DNS seem to work properly, which is really weird. This also happened:

root@1adf42268642:/# curl http://archive.ubuntu.com -v
* Rebuilt URL to: http://archive.ubuntu.com/
*   Trying 91.189.88.142...
* TCP_NODELAY set
* Connected to archive.ubuntu.com (91.189.88.142) port 80 (#0)
> GET / HTTP/1.1
> Host: archive.ubuntu.com
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Sun, 13 Dec 2020 09:01:26 GMT
< Server: Apache/2.4.29 (Ubuntu)
< Vary: Accept-Encoding
< Content-Length: 696
< Content-Type: text/html;charset=UTF-8
< 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
 <head>
  <title>Index of /</title>
 </head>
 <body>
<h1>Index of /</h1>
  <table>
   <tr><th valign="top"><img src="/icons/blank.gif" alt="[ICO]"></th><th><a href="?C=N;O=D">Name</a></th><th><a href="?C=M;O=A">Last modified</a></th><th><a href="?C=S;O=A">Size</a></th></tr>
   <tr><th colspan="4"><hr></th></tr>
<tr><td valign="top"><img src="/icons/folder.gif" alt="[DIR]"></td><td><a href="ubuntu/">ubuntu/</a></td><td align="right">2020-12-13 08:25  </td><td align="right">  - </td></tr>
   <tr><th colspan="4"><hr></th></tr>
</table>
<address>Apache/2.4.29 (Ubuntu) Server at archive.ubuntu.com Port 80</address>
</body></html>
* Connection #0 to host archive.ubuntu.com left intact
root@1adf42268642:/# curl -v 91.189.88.142:80
* Rebuilt URL to: 91.189.88.142:80/
*   Trying 91.189.88.142...
* TCP_NODELAY set
* Connected to 91.189.88.142 (91.189.88.142) port 80 (#0)
> GET / HTTP/1.1
> Host: 91.189.88.142
> User-Agent: curl/7.58.0
> Accept: */*
> 
^C

Basically if I curl archive.ubuntu.com, it resolves to 91.189.88.142 and was able to access the webpage. If I curl 91.189.88.142:80 inside the build container, it hangs forever. This makes no sense to me, and I confirmed in the dind pod / host / my own machine that curl -v 91.189.88.142:80 should work as well. It's only in the currently building Docker container that it has this issue.

The exact output repo2docker gives (after hanging for a very long time):

Step 3/72 : RUN apt-get -qq update &&     apt-get -qq install--yes --no-install-recommends locales > /dev/null &&     apt-get -qq purge &&     apt-get -qq clean &&     rm -rf /var/lib/apt/lists/*
 ---> Running in 1adf42268642
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic/InRelease  Connection failed [IP: 91.189.88.142 80]
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic-updates/InRelease  Connection failed [IP: 91.189.88.152 80]
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/bionic-backports/InRelease  Connection failed [IP: 91.189.88.152 80]
W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/bionic-security/InRelease  Connection failed [IP: 91.189.91.39 80]
W: Some index files failed to download. They have been ignored, or old ones used instead.
E: Package 'locales' has no installation candidate

I'll dig around more and see if there's anything I can find. If we can get dind working in containerd, it would be a great temporary solution until repo2docker stops relying on docker build


Update: curl -v http://archive.ubuntu.com/ubuntu/ hangs in the container being built, but not in the dind container itself. Can replicate using the following Dockerfile:

FROM buildpack-deps:bionic
RUN curl -v http://archive.ubuntu.com/ubuntu/

Meanwhile curl -v http://archive.ubuntu.com/ubuntu and curl -v http://archive.ubuntu.com/ both work. This is very weird.


Update 2: It might be a MTU issue. (By the way, apologies for spamming this thread, I hope other people will find this useful but I'm not sure if this is getting off topic) image

rkevin-arch avatar Dec 13 '20 09:12 rkevin-arch

I've gotten binder to work with dind on top of containerd. Building and spawning images work, and I tested Thebe with it as well.

I did notice because dind and host containerd do not share an image cache, the image has to be pushed to the registry and repulled using the actual container runtime on the same host, which can be a bit slow during the first spawn. Other than that, I don't feel any differences using this binderhub and one running on top of docker (it's running at https://binder.galaxy.rkevin.dev, feel free to try it if you want).

I think it's nice to add the following to the documentation:

If you are using containerd / CRI-O / some container runtime other than Docker, you must do the following:

  1. Enable DinD by setting dind.enabled.
  2. Depending on your network configuration, you should set a MTU smaller than 1500 for the DinD daemon using dind.daemonset.extraArgs. If your apt update during builds just hangs as if the remote server is not responding, definitely double check this. (I'm using Calico as the CNI plugin, and I needed to set a smaller MTU.)
  3. Disable the image cleaner entirely, or set imageCleaner.host.enabled to be false.

Relevant parts of my helm values:

dind:
  enabled: true
  daemonset:
    image:
      name: docker
      tag: 19.03.14-dind
    extraArgs:
      - --mtu
      - "1400"

imageCleaner:
  host:
    enabled: false

rkevin-arch avatar Dec 13 '20 13:12 rkevin-arch

@rkevin-arch wieee thanks for your work on this! From a high level view, would it be reasonable to look for the following outcomes?

Outcomes aimed for

  • Documentation
    • About the docker CRI deprecation
      • Each k8s cluster use one CRI, such as docker or containerd. (Right?)
      • Docker CRI is deprecated in k8s 1.20 and won't be supported in k8s 1.22.
    • What a BinderHub maintainer needs to know
      • Case 1: BinderHub's image building has assumed docker as a CRI, and require configuration to work with containerd as a CRI.
      • Case 2: Code changes made us support both docker and containerd without adjustments, just ensure to use version >= X.
  • Code changes
    • Is there an intersection of docker / containerd functional configuration allowing us to support both without custom configuration?

consideRatio avatar Dec 13 '20 14:12 consideRatio

Each k8s cluster use one CRI, such as docker or containerd. (Right?)

Pretty much, although I think you can mix and match different CRIs in one cluster (each node/kubelet uses one CRI, but for example windows nodes currently only has Docker as a CRI still)

I don't think anything needs to be changed in the binderhub helm chart. If you set dind and imageCleaner helm values like I mentioned above, the current helm chart works like a charm with no issues. However, the documentation should make it clear that you're still using docker under the hood, and you're running a full docker daemon in containerd in a privileged container to support binderhub. (k8s itself does not interact with the dind docker daemon at all, and it should not affect the host containerd).

I think mentioning the configuration I had should be good enough for the documentation for now. Ideally this dind solution is a temporary one until repo2docker can support building without docker, but I don't think anything in binderhub needs to change for now.

rkevin-arch avatar Dec 13 '20 14:12 rkevin-arch

I did notice because dind and host containerd do not share an image cache, the image has to be pushed to the registry and repulled using the actual container runtime on the same host, which can be a bit slow during the first spawn.

This is always the case when using DIND (and a reason not to have DIND turned on by default IIRC)

As a long term thing having repo2docker gain the ability to talk to containerd (is containerd a full replacement for dockerd or does it only contain the bits needed to run images?) or podman would be nice.

I think the "dind approach" we should maintain because right now it allows for a transition phase but it also allows large, public binderhubs like mybinder.org to limit the amount of resources consumed by builds and get some separation between the CRI of the host and the one that builds untrusted images for users. It isn't perfect but it does make people's lives a bit harder if they put nasty things in their Dockerfiles (I think).

If you enable dind you probably also want to add a separate partition to your nodes and use that partition to store images produced by dind. That keeps the garbage collection easier on the host and for dind. Otherwise the kubelet GC keeps deleting images because it thinks the drive is full. Yet it never makes any progress because all the space is consumed by the images produced by dind. If you can't use a separate partition I'd recommend switching the image cleaner for DIND to using a size in bytes as the threshold and not a percentage of inodes.

betatim avatar Dec 13 '20 15:12 betatim

This is always the case when using DIND

Yeah that makes sense. Good to know!

(is containerd a full replacement for dockerd or does it only contain the bits needed to run images?)

I don't think containerd itself can build images. There seems to be multiple alternatives for building images, so if repo2docker can support one of them it would be nice (I'm not sure which one is better, though)

but it does make people's lives a bit harder if they put nasty things in their Dockerfiles (I think).

I actually thought this when I first found out about repo2docker, but from what I can tell there doesn't seem to be a way to break out of a container even if you control the Dockerfile. I've tried for a while and looked around, but pretty much all container breakouts (that aren't security bugs in Docker itself) rely on either the docker socket or a device in /dev being mounted inside the container, or if the container runs with elevated privileges to begin with. I don't think you can do any of these by controlling the Dockerfile. The worst you can do is to hog resources, which is a legit concern, but I don't think dind helps with that specifically.

If you enable dind you probably also want to add a separate partition to your nodes and use that partition to store images produced by dind.

Good to know! For our intended use case (it mostly powers Thebe with an image curated ourselves, and we won't be using it to spawn arbitrary images) it's not a big concern, but I think if you're adding documentation, this would definitely be a good thing to include.

rkevin-arch avatar Dec 13 '20 15:12 rkevin-arch

https://github.com/jupyterhub/binderhub/issues/1318 is a discussion on generalising BinderHub to not rely on Kubernetes. Docker is mentioned as a target but other container engines are in scope.

Is there any action we need to take on this particular issue?

manics avatar Jul 10 '21 11:07 manics