Provide feedback to users on cluster creation status
This is a somewhat fuzzy feature request, apologies if it's ill-formed. I'm also focused primarily on the Kubernetes backend, but the same concepts likely applies to other auto-scaling backends.
Creating and scaling a cluster involves several steps, each of which can take a variable amount of time. I'd love to provide some kind of mechanism to the user to see what is going on at each of these stages:
- Creating a cluster makes a scheduler pod. This might involve the kubernetes autoscaler creating a new node, downloading an image.
- Scaling a cluster makes many worker new pods. This again might involve the kubernetes autoscaler making one or more new nodes and downloading the worker image.
With access to the Kubernetes API, we can view the events to see where a cluster / worker is in the process of being created (triggered an autoscaling event, assigned to a node, downloading an image, starting). This is at a finer resolution than what's available through the dask / dask-gateway API. So I wonder if there's a way to filter those events to a particular user and somehow make them available.
We could have a separate API endpoint that just lists the backend's messages associated with a given cluster
gateway = dask_gateway.Gateway()
cluster = gateway.new_cluster() # blocks until ready
cluster.scale(2) # non-blocking
events = gateway.cluster_events(cluster.id) # type: List[str]
I think we've used asynchronous iterators for providing status updates elsewhere in Dask. Perhaps they could be used here, by logging to a StreamHandler when some event is done?
Any thoughts on this would be very welcome!