cirrus
cirrus copied to clipboard
Multiple PS
Multiple parameter servers for logistic regression.
I still need to the code smart enough to be able to switch between LR in multiple PS configuration and not multiple PS configuration in other models (like CF).
Working on multithreading a few of the operations. Currently about a 1k/sec dip, but it doesn't appear to be from inefficiencies in MultiplePSSparseServerInterface
Fixed the dip in performance. Finalizing PR.
There are a few correctness checks that need to be completed, and a few bugs to be ironed out.
As of now (on Ubuntu machines):
- Occasionally PS crashes on start. Error is not reliably reproduce-able. I think it might have to do with poll thread concurrency.
- Workers crash after a few minutes. Not sure why.
- Updates per second does not scale with number of parameter servers. There is no loss in number of updates, but there is no increase either. Not sure why...
git-clang-format was removed while I was debugging travis build errors. I will put it back before finalizing PR.
Make Multiple PS Interface a subclass
This code doesn't compile.
@andrewmzhang Can you fix this ASAP? Doesn't compile.
I'll get this fixed today
Currently working on fixing the PR review items. LR and MF work correctly (they converge, no crashes, etc).
Sorry for the force pushes. I cleaned some unclear commit messages
Please fix the conflicts.
Note to myself to check for the naming of training datasets in S3.
Note to remove CSV.
Note to switch hash func to murmur
PR is ready
Please fix conflicts.
Some requested changes are still open. Can you mark the ones that have been resolved?