binderhub
binderhub copied to clipboard
Race condition: dind is required to be faster than build pod
Summary
This is a BinderHub bug I think, so I transferred it here from a issue report in the mybinder.org-deploy repo.
Original issue from mybinder.org-deploy
A helm2 -> helm3 transition on prod worked fine in many ways, but restarting dind and image-cleaner turned out a bit problematic. Below I provide some logs of relevance but as I note that the errorings pods came from the same node, I also also inspected the k8s Events related to that node and found this.
Type Reason Age From Message
---- ------ ---- ---- -------
Warning OOMKilling 50m kernel-monitor Memory cgroup out of memory: Kill process 1689494 (python) score 1943 or sacrifice child
Killed process 1689494 (python) total-vm:3887064kB, anon-rss:1976596kB, file-rss:16116kB, shmem-rss:0kB
Warning NodeSysctlChange 27m (x2 over 6h27m) sysctl-monitor {"unmanaged": {"net.ipv4.tcp_fastopen_key": "cc28e282-cc64df53-ec94ce54-ef86d970"}}
Normal NodeNotSchedulable 3s kubelet Node gke-prod-user-202009-b9c03ca0-cphm status is now: NodeNotSchedulable
I guessed that it was no coincidence but could also find another node with the sysctl-monitor {"unmanaged": ... warning so I'm really not sure.
I went ahead and cordoned that node to help the k8s cluster recover by phasing it out over time as real users stop being scheduled there and leave.
prod-dind-rn5zb on node gke-prod-user-202009-b9c03ca0-cphm
time="2020-12-06T18:36:46.432649056Z" level=info msg="Starting up"
time="2020-12-06T18:36:46.434126429Z" level=warning msg="could not change group /var/run/dind/docker.sock to docker: group docker not found"
failed to load listeners: can't create unix socket /var/run/dind/docker.sock: is a directory
prod-image-cleaner-99jbz on node gke-prod-user-202009-b9c03ca0-cphm
2020-12-06 18:41:36,985 Pruning docker images when /var/lib/docker has 80.0% inodes or blocks used
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.8/http/client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1010, in _send_output
self.send(msg)
File "/usr/local/lib/python3.8/http/client.py", line 950, in send
self.connect()
File "/usr/local/lib/python3.8/site-packages/docker/transport/unixconn.py", line 43, in connect
sock.connect(self.unix_socket)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 531, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 734, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.8/http/client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1010, in _send_output
self.send(msg)
File "/usr/local/lib/python3.8/http/client.py", line 950, in send
self.connect()
File "/usr/local/lib/python3.8/site-packages/docker/transport/unixconn.py", line 43, in connect
sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionRefusedError(111, 'Connection refused'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/docker/api/client.py", line 207, in _retrieve_server_version
return self.version(api_version=False)["ApiVersion"]
File "/usr/local/lib/python3.8/site-packages/docker/api/daemon.py", line 181, in version
return self._result(self._get(url), json=True)
File "/usr/local/lib/python3.8/site-packages/docker/utils/decorators.py", line 46, in inner
return f(self, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/docker/api/client.py", line 230, in _get
return self.get(url, **self._set_request_timeout(kwargs))
File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 555, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionRefusedError(111, 'Connection refused'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/image-cleaner.py", line 255, in <module>
main()
File "/usr/local/bin/image-cleaner.py", line 161, in main
client = docker.from_env(version='auto')
File "/usr/local/lib/python3.8/site-packages/docker/client.py", line 84, in from_env
return cls(
File "/usr/local/lib/python3.8/site-packages/docker/client.py", line 40, in __init__
self.api = APIClient(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/docker/api/client.py", line 190, in __init__
self._version = self._retrieve_server_version()
File "/usr/local/lib/python3.8/site-packages/docker/api/client.py", line 214, in _retrieve_server_version
raise DockerException(
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', ConnectionRefusedError(111, 'Connection refused'))
Event of a local build pod on the node
Normal Scheduled 34m default-scheduler Successfully assigned prod/build-<censored> to gke-prod-user-202009-b9c03ca0-cphm
Warning FailedMount 32m kubelet Unable to attach or mount volumes: unmounted volumes=[docker-socket], unattached volumes=[docker-push-secret default-token-jcbqc docker-socket]: timed out waiting for the condition
Warning FailedMount 12m (x9 over 30m) kubelet Unable to attach or mount volumes: unmounted volumes=[docker-socket], unattached volumes=[docker-socket docker-push-secret default-token-jcbqc]: timed out waiting for the condition
Warning FailedMount 4m5s (x23 over 34m) kubelet MountVolume.SetUp failed for volume "docker-socket" : hostPath type check failed: /var/run/dind/docker.sock is not a socket file
This build pod had been truncated to 63 characters btw if that could be of relevance.
The "can't mount docker socket" problem looks like https://mybinder-sre.readthedocs.io/en/latest/operation_guide/common_problems.html#the-docker-in-docker-socket. We haven't had this happen in a long time. I don't think we ever figured out why/when exactly it happens but the fix is easy to apply.
Not sure if the other errors are related/connected to this though.
@betatim nice the issue is diagnosed excellently! I wonder if this could be solvable by using the hostPath configuration options described here: https://kubernetes.io/docs/concepts/storage/volumes/#hostpath, hmm...
Exploration of hostPath volumes in BinderHub
https://github.com/jupyterhub/binderhub/blob/361db829bf90b1add6c7dd0f9de12b509c91a179/helm-chart/binderhub/values.yaml#L161-L162
https://github.com/jupyterhub/binderhub/blob/361db829bf90b1add6c7dd0f9de12b509c91a179/helm-chart/binderhub/templates/dind/daemonset.yaml#L55-L63
https://github.com/jupyterhub/binderhub/blob/361db829bf90b1add6c7dd0f9de12b509c91a179/binderhub/builder.py#L376
https://github.com/jupyterhub/binderhub/blob/361db829bf90b1add6c7dd0f9de12b509c91a179/binderhub/app.py#L348-L363
https://github.com/jupyterhub/binderhub/blob/361db829bf90b1add6c7dd0f9de12b509c91a179/binderhub/build.py#L238-L246
https://github.com/jupyterhub/binderhub/blob/361db829bf90b1add6c7dd0f9de12b509c91a179/helm-chart/binderhub/templates/image-cleaner.yaml#L67-L82
Conclusion
I fail to draw a conclusion, I remain confused about what exactly goes on and what is the root cause and if it can be avoided by a small change of hostPath configuration.
I know one thing we can do, which would add complexity though, which is to make build pods disallowed to be scheduled until the node is ready for a build pod as indicated by updating labels on the node that the build pod has a selector for. Hmmm, perhaps another alternative is to have a affinity for dind pods is also an option? Or I'm a bit confused, can you do build pods without dind also?
The last time we investigated this I think we concluded that DirectoryOrCreate should prevent this problem from happening. But it doesn't so we left confused. :-/
Yes a build pod can run without DIND. In that case it will use the dockerd on the host directly. This is the default config of BinderHub. I think most people should enable DIND but it has some drawbacks and needs extra config in order to work well. Which is why "no DIND" is the default.
In the case were DIND is enabled it seems like a good move to prevent build pods from running on nodes that don't have a running and ready DIND pod. I am not sure if this would fix our problem though or if it would result in no build pods ever running on that node? The end result would still be that a node is broken/not fully usable somehow.
I am having the same issue. Are there any further updates or insights people could offer?
In a recent PR @sgaist and I discussed the idea of replacing the docker/podman file socket with a tcp. This requires some care with routing to preserve the affinity between the build node cache and the target image- the tcp connection to the docker daemon would need to be to a particular dind pod on the required build node, as opposed to using a service. Kubernetes doesn't support a combined daemonset-statefulset so it probably requires querying K8S for the dind pod IPs. However it would remove the race condition in the creation of the docker.sock file.
@manics there might already be some logic in build.py that could be reused for getting the IP of the pod we would like to use.
On a side note, AFAIK, there's currently no guarantee that you'll use the same pod twice for the same image unless you use the sticky_builds option and/or build_node_selector.
One question though, would the use of a tcp socket replace the current unix socket or be an additional option ?