shifu
shifu copied to clipboard
Tensorflow Straggler Mitigation by Speculative Execution
Each iteration to do stats and check if any slow workers, check like STDDev and if any outlier worker could be run one as standby backup worker in backup pool.
There is no need to do that. Backup has been implemented in TF. that means, each iteration only takes the fastest N workers and give up the slowest C Straggler.