cugraph
cugraph copied to clipboard
[BUG] Using `Client.wait_for_workers` Does Not Properly Wait for Workers
While running benchmarks for the GNN packages in a multinode environment, @jnke2016 and I found that calling Client.wait_for_workers was not working properly, causing a hang or crash when running a dask workflow. Currently, we have a workaround that uses a separate script (wait_for_workers.py) to wait for all workers prior to launching a workflow. This workaround should be eliminated in favor of fixing the bug and calling Client.wait_for_workers as intended by the dask API.
Possibly related to https://github.com/dask/distributed/pull/8314 ?
Could be, I'll definitely test once that PR is merged.
Not sure it will be, sorry. The approach I had there was not considered appropriate long term. I'll see if I can dig up the current state of any discussions
The approach I had there was not considered appropriate long term. I'll see if I can dig up the current state of any discussions
@wence- , did you get any feedback?