cirrus icon indicating copy to clipboard operation
cirrus copied to clipboard

Multiple PS

Open andrewmzhang opened this issue 6 years ago • 17 comments

Multiple parameter servers for logistic regression.

andrewmzhang avatar Aug 03 '18 13:08 andrewmzhang

I still need to the code smart enough to be able to switch between LR in multiple PS configuration and not multiple PS configuration in other models (like CF).

andrewmzhang avatar Aug 03 '18 13:08 andrewmzhang

Working on multithreading a few of the operations. Currently about a 1k/sec dip, but it doesn't appear to be from inefficiencies in MultiplePSSparseServerInterface

andrewmzhang avatar Aug 06 '18 11:08 andrewmzhang

Fixed the dip in performance. Finalizing PR.

andrewmzhang avatar Aug 06 '18 20:08 andrewmzhang

There are a few correctness checks that need to be completed, and a few bugs to be ironed out.

As of now (on Ubuntu machines):

  1. Occasionally PS crashes on start. Error is not reliably reproduce-able. I think it might have to do with poll thread concurrency.
  2. Workers crash after a few minutes. Not sure why.
  3. Updates per second does not scale with number of parameter servers. There is no loss in number of updates, but there is no increase either. Not sure why...

git-clang-format was removed while I was debugging travis build errors. I will put it back before finalizing PR.

andrewmzhang avatar Aug 09 '18 21:08 andrewmzhang

Make Multiple PS Interface a subclass

andrewmzhang avatar Aug 15 '18 21:08 andrewmzhang

This code doesn't compile.

jcarreira avatar Sep 13 '18 16:09 jcarreira

@andrewmzhang Can you fix this ASAP? Doesn't compile.

jcarreira avatar Sep 18 '18 16:09 jcarreira

I'll get this fixed today

andrewmzhang avatar Sep 18 '18 16:09 andrewmzhang

Currently working on fixing the PR review items. LR and MF work correctly (they converge, no crashes, etc).

andrewmzhang avatar Nov 16 '18 13:11 andrewmzhang

Sorry for the force pushes. I cleaned some unclear commit messages

andrewmzhang avatar Nov 17 '18 15:11 andrewmzhang

Please fix the conflicts.

jcarreira avatar Jan 05 '19 23:01 jcarreira

Note to myself to check for the naming of training datasets in S3.

jcarreira avatar Jan 19 '19 23:01 jcarreira

Note to remove CSV.

andrewmzhang avatar Jan 24 '19 23:01 andrewmzhang

Note to switch hash func to murmur

andrewmzhang avatar Jan 25 '19 22:01 andrewmzhang

PR is ready

andrewmzhang avatar Feb 09 '19 05:02 andrewmzhang

Please fix conflicts.

jcarreira avatar Feb 12 '19 03:02 jcarreira

Some requested changes are still open. Can you mark the ones that have been resolved?

jcarreira avatar Feb 14 '19 05:02 jcarreira