distributed icon indicating copy to clipboard operation
distributed copied to clipboard

Local Python env Cluster

Open DPeterK opened this issue 4 years ago • 10 comments

  • [x] Closes https://github.com/dask/dask-labextension/issues/82
  • [ ] Tests added / passed
  • [x] Passes black distributed / flake8 distributed / isort distributed

Add a new distributed cluster manager that provides a Scheduler and Workers, running on localhost, but which are run using a different, user-specified Python executable.

Rationale This is particularly desirable functionality when using the dask labextension and JupyterLab, as dask-labextension will launch a LocalCluster using the same Python executable as provides the JupyterLab instance. This may not always be desirable, notably when also using Cloud JupyterLab services (such as SageMaker Studio or AzureML), where you do not have control over the Python executable running the JupyterLab instance.

Note: no tests yet! I'd appreciate some input on how I might test this...

cc @jacobtomlinson @jrbourbeau

DPeterK avatar Jul 01 '21 12:07 DPeterK

This is looking good @DPeterK. Just needs some tests now. I would look at the SSHCluster and LocalCluster tests for some ideas.

I expect you will need a fixture which figures out the current conda env's name and uses that. Testing multiple envs may be tricky from a setup point of view.

jacobtomlinson avatar Jul 12 '21 15:07 jacobtomlinson

@jrbourbeau usually had good pointers on testing.

jacobtomlinson avatar Jul 12 '21 15:07 jacobtomlinson

Can one of the admins verify this patch?

GPUtester avatar Aug 02 '21 12:08 GPUtester

I've added a module with tests for this cluster manager, to at least show my intention for testing it - as the tests aren't passing currently. There seem to be two problems:

  1. bad integration with the IO loop (?)
  2. the cluster isn't closing cleanly.

Here's the traceback in each case...

1. bad integration with the IO loop:

Traceback (most recent call last):
  File "/.../distributed/distributed/deploy/tests/test_local_env.py", line 82, in test_set_env
    result = await client.submit(f)
  File "/.../distributed/distributed/client.py", line 427, in __await__
    return self.result().__await__()
AttributeError: 'int' object has no attribute '__await__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/.../distributed/distributed/utils.py", line 671, in log_errors
    yield
  File "/.../distributed/distributed/client.py", line 1358, in _close
    await asyncio.wait_for(
  File "/.../miniconda3/envs/dask-distributed/lib/python3.8/asyncio/tasks.py", line 475, in wait_for
    fut = ensure_future(fut, loop=loop)
  File "/.../miniconda3/envs/dask-distributed/lib/python3.8/asyncio/tasks.py", line 678, in ensure_future
    raise ValueError('The future belongs to a different loop than '
ValueError: The future belongs to a different loop than the one specified as the loop argument

2. the cluster isn't closing cleanly:

Traceback (most recent call last):
  File "/.../distributed/distributed/client.py", line 1209, in __del__
    self.close()
  File "/.../distributed/distributed/client.py", line 1449, in close
    sync(self.loop, self._close, fast=True, callback_timeout=timeout)
  File "/.../distributed/distributed/utils.py", line 348, in sync
    raise TimeoutError("timed out after %s s." % (callback_timeout,))
asyncio.exceptions.TimeoutError: timed out after 20 s.

I'll dig into these errors, but I thought it was worth pushing the tests and noting the errors in case they are known / common to others...

I'll also fix the conflict with upstream.

DPeterK avatar Aug 02 '21 12:08 DPeterK

Hello, is there a reason why this functionality hasn't been finalized?

panas2567 avatar Oct 09 '25 10:10 panas2567

I think it was just a lack of review capacity. Sorry this has sat here for so long @DPeterK! @panas2567 do you have any interest in trying to push these changes over the line?

jacobtomlinson avatar Oct 09 '25 13:10 jacobtomlinson

Thanks @jacobtomlinson for such a quick response. Well, the case I'm solving now, is that I'm setting up an Jhub environment in AzureML for my team, hence I have to tackle the dependency differences between the default jupyter_env conda env (the scheduler and workers' env) and the notebook's kernel (the client's env).

It'd directly solve this challenge, if we were able to select a custom environment for the cluster created from the Dask extension/widget in Jhub.

panas2567 avatar Oct 09 '25 13:10 panas2567

I think this was also the same use case that @DPeterK had

jacobtomlinson avatar Oct 09 '25 15:10 jacobtomlinson

I see, I'm happy to help if there is anything still to be implemented.

panas2567 avatar Oct 10 '25 06:10 panas2567

@panas2567 I think the next steps here would be to resolve the merge conflicts and then test this out and see if it solves your problem. Then report back here.

jacobtomlinson avatar Oct 10 '25 12:10 jacobtomlinson