ps-lite
ps-lite copied to clipboard
Does PS-lite provide node failure tolerance for server nodes ?
I tried playing with parameter_server linear example and killing a server process/node hangs the running process. Shouldn't the replicated node take over for the killed server as described in the paper ?
Any help in this regard will be highly appreciated.
Thanks, Danish
If you launched job via yarn, one recovery node will be added but the parameter has been lost because their exist no backup.