Welcome_to_BlazingSQL_Notebooks
Welcome_to_BlazingSQL_Notebooks copied to clipboard
Number of workers not matching number of nodes
+1 on building such an awesome product guys. Here's an issue I've ran into a couple times -
If you hit an OOM or do something else that corrupts state you can lose workers that won't come back with a
bc.dask_client.restart()
or client.restart()
This isn't a huge issue bc it can be quickly fixed by stopping and starting the cluster, and if a 32 node cluster drops to 25 workers everything still works.
More of an issue - I just stopped and started a 128 node cluster and it came up with only 1 worker. restarting dask client from within py didn't help. Trying to reproduce. I took some screenshots and kept the logs - will send them over.
JB