dask-kubernetes
dask-kubernetes copied to clipboard
Re-create the socket object in case of a connection failure
When connecting to my dask cluster using
cluster = HelmCluster(release_name='foo')
I kept getting the following error: ConnectionError: kubectl port forward failed
. After debugging a bit, it turned out that the initial connection on the socket failed because the port forwarding was not quite ready yet, and then all the 99 subsequent connect_ex()
calls failed because the socket object was messed up. Re-creating the socket object at each retry fixed the issue. Sure, there is a small cost to this, but given that we have a sleep(2) in that loop, performance is hardly a concern.
Aslo - do we really need 100 retry attempts? That means 200 seconds of retries, so more than 3 minutes until users get an answer back - I would say that even on the slowest machines the port forwarding should be done after a few seconds. The current implementation just gives the impression that everything hangs.