distributed starting dask.distributed.Client with default settings results in endless restarting of workers

What happened:

starting Client locally with default settings results in endless restarting of workers (with processes=True)
Note: this happened after updating my conda environment from Dask 2021.3.0 and Python 3.8.8 (it was ok with these versions before)

What you expected to happen: for the client to start

Minimal Complete Verifiable Example:

This fails:

from dask.distributed import Client
client = Client(processes=True)

This starts:

from dask.distributed import Client
client = Client(processes=False)

Anything else we need to know?:
Running from Jupyter notebook

Environment:

Dask version: 2022.9.0
Distributed version: 2022.9.0
Python version: 3.9.13
Operating System: Windows 10
Install method conda-forge

Error output on failing example:

2022-09-14 09:17:43,926 - distributed.nanny - WARNING - Restarting worker
2022-09-14 09:17:43,939 - distributed.nanny - WARNING - Restarting worker
2022-09-14 09:17:43,949 - distributed.nanny - WARNING - Restarting worker
2022-09-14 09:17:43,962 - distributed.nanny - WARNING - Restarting worker
2022-09-14 09:17:43,969 - distributed.nanny - WARNING - Restarting worker

Traceback (most recent call last):
  File "c:\Users\myuser\Miniconda3\envs\sim\lib\site-packages\distributed\nanny.py", line 822, in _wait_until_connected
    msg = self.init_result_q.get_nowait()
  File "c:\Users\myuser\Miniconda3\envs\sim\lib\multiprocessing\queues.py", line 135, in get_nowait
    return self.get(False)
  File "c:\Users\myuser\Miniconda3\envs\sim\lib\multiprocessing\queues.py", line 116, in get
    raise Empty
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\myuser\Miniconda3\envs\sim\lib\site-packages\distributed\utils.py", line 799, in wrapper
    return await func(*args, **kwargs)
  File "c:\Users\myuser\Miniconda3\envs\sim\lib\site-packages\distributed\nanny.py", line 539, in _on_worker_exit
    await self.instantiate()
  File "c:\Users\myuser\Miniconda3\envs\sim\lib\site-packages\distributed\nanny.py", line 438, in instantiate
    result = await self.process.start()
  File "c:\Users\myuser\Miniconda3\envs\sim\lib\site-packages\distributed\nanny.py", line 695, in start
    msg = await self._wait_until_connected(uid)
  File "c:\Users\myuser\Miniconda3\envs\sim\lib\site-packages\distributed\nanny.py", line 824, in _wait_until_connected
    await asyncio.sleep(self._init_msg_interval)
  File "c:\Users\myuser\Miniconda3\envs\sim\lib\asyncio\tasks.py", line 652, in sleep
    return await future
asyncio.exceptions.CancelledError
2022-09-14 09:17:43,926 - distributed.nanny - WARNING - Restarting worker
2022-09-14 09:17:43,939 - distributed.nanny - WARNING - Restarting worker
2022-09-14 09:17:43,949 - distributed.nanny - WARNING - Restarting worker
2022-09-14 09:17:43,962 - distributed.nanny - WARNING - Restarting worker
2022-09-14 09:17:43,969 - distributed.nanny - WARNING - Restarting worker
...

Sep 14 '22 13:09 radioflyer28

A little update, I progressively tried rolling back Dask versions from 2022.9.0 to 2021.3.0 and tested each and saw the same issue. My theory was that shouldn't happen since 2021.3.0 was ok before.

So, next to rule out any environment oddities, I reverted to my old conda environment (via a yaml backup) with Python 3.8.8 and Dask 2021.3.0 and am still seeing the same restarting worker behavior... This time I ran it with the normal interpreter too (not Jupyter) and it made no difference. A bit perplexing...

Sep 15 '22 21:09 radioflyer28

This is happening for me also but just in Zeppelin (jupyter is working fine)

specifying processes=True (which is the default) for LocalCluster or Client make the paragraph hangs forever in zeppelin (see examples below):

%python
from dask.distributed import Client
client = Client()
client

or

%python
from dask.distributed import LocalCluster
cluster = LocalCluster()
cluster

From the Dask documentation, the client: will check your local Dask config and environment variables to see if connection information has been specified. If not it will create an instance of LocalCluster and use that.

so the problem is rather in the LocalCluster with the processes param set to True for some reason. Could you please advise why this is only happening for zeppelin, we are really stuck... Thank you in advance.

Sep 20 '22 20:09 GhassanGuessous

After combing through old issues, I believe my issue is a duplicate of this one (https://github.com/dask/distributed/issues/5574). Kind of amazing this issue still exists after all these years, it seems to be a very low priority for Microsoft... This is the issue in VSCode's repo (https://github.com/microsoft/vscode-jupyter/issues/2962).

My problem went away once I stopped using the VSCode Python Interactive Window (I switched back to Atom and its Hydrogen plugin). Of course, running the script from the shell is ok too.

Sep 22 '22 18:09 radioflyer28

distributed distributed copied to clipboard

starting dask.distributed.Client with default settings results in endless restarting of workers

distributed
distributed copied to clipboard