ps-lite icon indicating copy to clipboard operation
ps-lite copied to clipboard

Does PS-lite provide node failure tolerance for server nodes ?

Open DanishKhan14 opened this issue 9 years ago • 1 comments

I tried playing with parameter_server linear example and killing a server process/node hangs the running process. Shouldn't the replicated node take over for the killed server as described in the paper ?

Any help in this regard will be highly appreciated.

Thanks, Danish

DanishKhan14 avatar Dec 14 '16 22:12 DanishKhan14

If you launched job via yarn, one recovery node will be added but the parameter has been lost because their exist no backup.

formath avatar Jan 26 '17 08:01 formath