dask-kubernetes icon indicating copy to clipboard operation
dask-kubernetes copied to clipboard

Retire pending workers

Open BitTheByte opened this issue 1 year ago • 2 comments

Based on discussion at https://github.com/dask/dask-kubernetes/issues/817

BitTheByte avatar Mar 20 '24 22:03 BitTheByte

I believe https://github.com/dask/dask-kubernetes/pull/877 has implemented the same logic in this PR.

Have you been running this in a real world setting?

I've been running this PR on a 300 worker cluster for a somewhile and it worked effortlessly.

Do you have any thoughts about how we could test this?

Unfortunately, No since I'm not familiar with dask's test suites. but generally a test should launch a n of unschedulable workers and wait for the controller to downscale them to the correct number.

BitTheByte avatar Apr 09 '24 20:04 BitTheByte

I just bumped this PR to use the latest code from main but something in here is still causing the CI to hang. It's not obvious to me what is causing it, but we will need to get to the bottom of it to be able to get this merged. @BitTheByte do you think you will have some time to try and push this over the line?

jacobtomlinson avatar Apr 30 '24 20:04 jacobtomlinson