xgboost
xgboost copied to clipboard
xgboost dask with specified address/port tries to re-bind to same port in some situations
The default behavior of xgboost with dask is to bind a listener to an ephemeral port on the scheduler.
We had been trying to deploy dask in a way where we only opened specified ports on the scheduler (specifically, running the dask scheduler in a container using bridge networking mode). We tried using xgboost.scheduler_address
to set the address/port to one that we explicitly opened.
The problem is that in certain circumstances it appears that xgboost tries to re-bind using xgboost.scheduler_address
, which doesn't work... it gets Address already in use
error. This happens when trying to run training a second time, and might happen even when running training once but on a larger dataset (not sure about this though).
More details here: https://github.com/coiled/coiled-runtime/issues/150
I'm guessing it should either release or re-use the listener it originally bound, and it's not doing that.
(In case it matters, our solution is probably going to be not using a networking mode that requires opening explicit ports.)
It should release the port once the training is finished. I'm not sure what's happening. Will try to reproduce it on my end.
Perhaps the tracker is not deleted: https://github.com/dmlc/xgboost/blob/545fd4548e303931dafd98d6606454fcdc2b8f2f/python-package/xgboost/tracker.py#L205