evalml icon indicating copy to clipboard operation
evalml copied to clipboard

DaskEngine: Internal vs. External Clusters

Open chukarsten opened this issue 3 years ago • 1 comments

In #2667 , we added the ability of AutoMLSearch to create its own Dask LocalCluster to perform work in parallel. An important aspect of using a Dask LocalCluster is turning it off when AutoML is done with it. For LocalClusters created by AutoMLSearch, it seems reasonable that the object will shutdown the cluster after it's done. But for externally provided clusters, that might not make sense and also might not be possible.

Successful completion of this issue involves refactoring the DaskEngine such that:

  1. When AutoMLSearch receives an instance as a parameter, it does not shutdown the engine. This case should simulate a consumer that is using AutoML but has access to an external Dask cluster.
  2. When AutoMLSearch creates its own Engine via the convenience string, it will shutdown the engine. This case should simulate a consumer that is using AutoML on their local machine.

chukarsten avatar Sep 01 '21 00:09 chukarsten

Makes sense.

Alternate proposal for behavior here: rather than conditioning shutdown on whether or not the engine info was provided via a client instance vs a string, what if we do this:

  • Define LocalDaskClient to be used when the dask engine spins up its own cluster
  • Define a teardown or similar method on that class. Have it be a no-op on the base dask client, and have it actually shut down the cluster on the local subclass
  • Call teardown at the end of dask engine

Does that make any sense?

dsherry avatar Sep 01 '21 19:09 dsherry