TonY icon indicating copy to clipboard operation
TonY copied to clipboard

Allow that one role of task executor could make other roles exit

Open zuston opened this issue 4 years ago • 3 comments

Why

Sometimes when using Tensorflow estimator api, user will do some other things in the role of CHIEF after training finished, but that will cause a lot of waste of resources PS used.

So maybe we need to introduce new mechanism to allow users to mark training job finished in their python script and notify AM to stop other task executors.

zuston avatar Jan 19 '22 03:01 zuston

Maybe this is a great improvement for saving resources. @oliverhu Please let me what you think.

zuston avatar Jan 19 '22 03:01 zuston

Can you elaboate a bit more? It is not a problem for us

oliverhu avatar Jan 19 '22 07:01 oliverhu

As we know that PS wont stop until chief finished. But actually this is only for training. If chief has two tasks:

  1. training. Need to cooperate with PS.
  2. do some other tasks which maybe a time-consuming operation. No need to cooperate with PS.

zuston avatar Jan 19 '22 07:01 zuston