sagemaker-xgboost-container icon indicating copy to clipboard operation
sagemaker-xgboost-container copied to clipboard

Distributed training: Add nanny process to terminate Rabit

Open asadoughi opened this issue 5 years ago • 1 comments

As an additional safeguard, add a nanny process to terminate Rabit in case it hangs at the end of training unexpectedly.

asadoughi avatar Jul 23 '19 17:07 asadoughi