Allow that one role of task executor could make other roles exit
Why
Sometimes when using Tensorflow estimator api, user will do some other things in the role of CHIEF after training finished, but that will cause a lot of waste of resources PS used.
So maybe we need to introduce new mechanism to allow users to mark training job finished in their python script and notify AM to stop other task executors.
Maybe this is a great improvement for saving resources. @oliverhu Please let me what you think.
Can you elaboate a bit more? It is not a problem for us
As we know that PS wont stop until chief finished. But actually this is only for training. If chief has two tasks:
- training. Need to cooperate with PS.
- do some other tasks which maybe a time-consuming operation. No need to cooperate with PS.