pangeo-cloud-federation icon indicating copy to clipboard operation
pangeo-cloud-federation copied to clipboard

Long wait time to get dask workers

Open jbusecke opened this issue 3 years ago • 3 comments

I have been noticing very long wait times to get dask workers to come online lately.

It just took me ~30 min to get any workers on the pangeo google cloud deployment.

Is there a way to resolve this? @rabernat suggested that "the cluster is maxed out".

For completeness, this is what I do in my notebook (pretty much the recommmended code):

from dask_gateway import GatewayCluster

cluster = GatewayCluster()
# cluster.adapt(minimum=4, maximum=40)  # or  to a fixed size.
cluster.scale(10)
cluster

jbusecke avatar Apr 30 '21 16:04 jbusecke

Apparently we have a 100 vCPU limit on the cluster, and today we were at that limit.

I just bumped it to 200. (For those with access, the page is here: https://console.cloud.google.com/kubernetes/clusters/details/us-central1-b/pangeo-uscentral1b/details?project=pangeo-181919)

Did that resolve the issue?

rabernat avatar Apr 30 '21 16:04 rabernat

I was eventually able to get workers even before raising this issue, but Ill keep an eye out in the upcoming days.

jbusecke avatar Apr 30 '21 18:04 jbusecke

Quick update: Right now I am getting dask workers quickly! Thanks for the adjustment.

jbusecke avatar Apr 30 '21 19:04 jbusecke