stylable
stylable copied to clipboard
support for fault tolerance and straggler mitigation
Hi i have noticed that there is a plan for Fault-tolerance and straggler mitigation support in the future plan section. So how is the progress going right now?
Also, there is related paper from your team said that they have made the implementation based on BytePS. "Elastic Parameter Server Load Distribution in Deep Learning Clusters"