elasticdl icon indicating copy to clipboard operation
elasticdl copied to clipboard

Master handles the errMsg that worker reports

Open chunyang-wen opened this issue 6 years ago • 0 comments

Currently, when a worker reports an error message to the master, the message seems not correctly handled.

If we support the failure of certain records, the master should not fail the job if the failure rate is acceptable.

If we do not support any failures, the job should be in the status of failure.

Now workers will report statistics to the master. Master can make its decision based on this.

chunyang-wen avatar Oct 31 '19 08:10 chunyang-wen