binderhub icon indicating copy to clipboard operation
binderhub copied to clipboard

Race condition: dind is required to be faster than build pod

Open consideRatio opened this issue 4 years ago • 6 comments

Summary

This is a BinderHub bug I think, so I transferred it here from a issue report in the mybinder.org-deploy repo.

Original issue from mybinder.org-deploy

A helm2 -> helm3 transition on prod worked fine in many ways, but restarting dind and image-cleaner turned out a bit problematic. Below I provide some logs of relevance but as I note that the errorings pods came from the same node, I also also inspected the k8s Events related to that node and found this.

  Type     Reason      Age   From            Message
  ----     ------      ----  ----            -------
  Warning  OOMKilling  50m   kernel-monitor  Memory cgroup out of memory: Kill process 1689494 (python) score 1943 or sacrifice child
Killed process 1689494 (python) total-vm:3887064kB, anon-rss:1976596kB, file-rss:16116kB, shmem-rss:0kB
  Warning  NodeSysctlChange    27m (x2 over 6h27m)  sysctl-monitor  {"unmanaged": {"net.ipv4.tcp_fastopen_key": "cc28e282-cc64df53-ec94ce54-ef86d970"}}
  Normal   NodeNotSchedulable  3s                   kubelet         Node gke-prod-user-202009-b9c03ca0-cphm status is now: NodeNotSchedulable

I guessed that it was no coincidence but could also find another node with the sysctl-monitor {"unmanaged": ... warning so I'm really not sure.

I went ahead and cordoned that node to help the k8s cluster recover by phasing it out over time as real users stop being scheduled there and leave.

prod-dind-rn5zb on node gke-prod-user-202009-b9c03ca0-cphm

time="2020-12-06T18:36:46.432649056Z" level=info msg="Starting up"
time="2020-12-06T18:36:46.434126429Z" level=warning msg="could not change group /var/run/dind/docker.sock to docker: group docker not found"
failed to load listeners: can't create unix socket /var/run/dind/docker.sock: is a directory

prod-image-cleaner-99jbz on node gke-prod-user-202009-b9c03ca0-cphm

2020-12-06 18:41:36,985 Pruning docker images when /var/lib/docker has 80.0% inodes or blocks used
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.8/http/client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.8/http/client.py", line 1301, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.8/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.8/http/client.py", line 1010, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.8/http/client.py", line 950, in send
    self.connect()
  File "/usr/local/lib/python3.8/site-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 531, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 734, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.8/http/client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.8/http/client.py", line 1301, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.8/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.8/http/client.py", line 1010, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.8/http/client.py", line 950, in send
    self.connect()
  File "/usr/local/lib/python3.8/site-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionRefusedError(111, 'Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/docker/api/client.py", line 207, in _retrieve_server_version
    return self.version(api_version=False)["ApiVersion"]
  File "/usr/local/lib/python3.8/site-packages/docker/api/daemon.py", line 181, in version
    return self._result(self._get(url), json=True)
  File "/usr/local/lib/python3.8/site-packages/docker/utils/decorators.py", line 46, in inner
    return f(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/docker/api/client.py", line 230, in _get
    return self.get(url, **self._set_request_timeout(kwargs))
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionRefusedError(111, 'Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/image-cleaner.py", line 255, in <module>
    main()
  File "/usr/local/bin/image-cleaner.py", line 161, in main
    client = docker.from_env(version='auto')
  File "/usr/local/lib/python3.8/site-packages/docker/client.py", line 84, in from_env
    return cls(
  File "/usr/local/lib/python3.8/site-packages/docker/client.py", line 40, in __init__
    self.api = APIClient(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/docker/api/client.py", line 190, in __init__
    self._version = self._retrieve_server_version()
  File "/usr/local/lib/python3.8/site-packages/docker/api/client.py", line 214, in _retrieve_server_version
    raise DockerException(
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', ConnectionRefusedError(111, 'Connection refused'))

Event of a local build pod on the node

  Normal   Scheduled    34m                  default-scheduler  Successfully assigned prod/build-<censored> to gke-prod-user-202009-b9c03ca0-cphm
  Warning  FailedMount  32m                  kubelet            Unable to attach or mount volumes: unmounted volumes=[docker-socket], unattached volumes=[docker-push-secret default-token-jcbqc docker-socket]: timed out waiting for the condition
  Warning  FailedMount  12m (x9 over 30m)    kubelet            Unable to attach or mount volumes: unmounted volumes=[docker-socket], unattached volumes=[docker-socket docker-push-secret default-token-jcbqc]: timed out waiting for the condition
  Warning  FailedMount  4m5s (x23 over 34m)  kubelet            MountVolume.SetUp failed for volume "docker-socket" : hostPath type check failed: /var/run/dind/docker.sock is not a socket file

This build pod had been truncated to 63 characters btw if that could be of relevance.

consideRatio avatar Dec 06 '20 19:12 consideRatio

The "can't mount docker socket" problem looks like https://mybinder-sre.readthedocs.io/en/latest/operation_guide/common_problems.html#the-docker-in-docker-socket. We haven't had this happen in a long time. I don't think we ever figured out why/when exactly it happens but the fix is easy to apply.

Not sure if the other errors are related/connected to this though.

betatim avatar Dec 07 '20 06:12 betatim

@betatim nice the issue is diagnosed excellently! I wonder if this could be solvable by using the hostPath configuration options described here: https://kubernetes.io/docs/concepts/storage/volumes/#hostpath, hmm...

Exploration of hostPath volumes in BinderHub

https://github.com/jupyterhub/binderhub/blob/361db829bf90b1add6c7dd0f9de12b509c91a179/helm-chart/binderhub/values.yaml#L161-L162

https://github.com/jupyterhub/binderhub/blob/361db829bf90b1add6c7dd0f9de12b509c91a179/helm-chart/binderhub/templates/dind/daemonset.yaml#L55-L63

https://github.com/jupyterhub/binderhub/blob/361db829bf90b1add6c7dd0f9de12b509c91a179/binderhub/builder.py#L376

https://github.com/jupyterhub/binderhub/blob/361db829bf90b1add6c7dd0f9de12b509c91a179/binderhub/app.py#L348-L363

https://github.com/jupyterhub/binderhub/blob/361db829bf90b1add6c7dd0f9de12b509c91a179/binderhub/build.py#L238-L246

https://github.com/jupyterhub/binderhub/blob/361db829bf90b1add6c7dd0f9de12b509c91a179/helm-chart/binderhub/templates/image-cleaner.yaml#L67-L82

Conclusion

I fail to draw a conclusion, I remain confused about what exactly goes on and what is the root cause and if it can be avoided by a small change of hostPath configuration.

I know one thing we can do, which would add complexity though, which is to make build pods disallowed to be scheduled until the node is ready for a build pod as indicated by updating labels on the node that the build pod has a selector for. Hmmm, perhaps another alternative is to have a affinity for dind pods is also an option? Or I'm a bit confused, can you do build pods without dind also?

consideRatio avatar Dec 13 '20 14:12 consideRatio

The last time we investigated this I think we concluded that DirectoryOrCreate should prevent this problem from happening. But it doesn't so we left confused. :-/

Yes a build pod can run without DIND. In that case it will use the dockerd on the host directly. This is the default config of BinderHub. I think most people should enable DIND but it has some drawbacks and needs extra config in order to work well. Which is why "no DIND" is the default.

In the case were DIND is enabled it seems like a good move to prevent build pods from running on nodes that don't have a running and ready DIND pod. I am not sure if this would fix our problem though or if it would result in no build pods ever running on that node? The end result would still be that a node is broken/not fully usable somehow.

betatim avatar Dec 13 '20 15:12 betatim

I am having the same issue. Are there any further updates or insights people could offer?

harrylojames avatar Jan 10 '22 22:01 harrylojames

In a recent PR @sgaist and I discussed the idea of replacing the docker/podman file socket with a tcp. This requires some care with routing to preserve the affinity between the build node cache and the target image- the tcp connection to the docker daemon would need to be to a particular dind pod on the required build node, as opposed to using a service. Kubernetes doesn't support a combined daemonset-statefulset so it probably requires querying K8S for the dind pod IPs. However it would remove the race condition in the creation of the docker.sock file.

manics avatar Jan 26 '23 20:01 manics

@manics there might already be some logic in build.py that could be reused for getting the IP of the pod we would like to use.

On a side note, AFAIK, there's currently no guarantee that you'll use the same pod twice for the same image unless you use the sticky_builds option and/or build_node_selector.

One question though, would the use of a tcp socket replace the current unix socket or be an additional option ?

sgaist avatar Jan 27 '23 09:01 sgaist