dmlc-core icon indicating copy to clipboard operation
dmlc-core copied to clipboard

Restore slurm tracker?

Open thvasilo opened this issue 6 years ago • 0 comments

There exists code for using the SLURM scheduler as a tracker for distributed training, but it was removed as an option from submit.py some time ago.

Lately I've been training XGBoost using an MPI cluster and while I haven't been able to get the mpi tracker to work, re-instating the SLURM tracker seems to work, after I made some changes to the command being called.

So would the community consider adding back SLURM as an option or is it supposed to be superseded by the mpi tracker now? In that case has anyone gotten the MPI tracker to train XGBoost recently?

thvasilo avatar Oct 07 '19 10:10 thvasilo