FedScale icon indicating copy to clipboard operation
FedScale copied to clipboard

Pull mechanism for client-server communication

Open ewenw opened this issue 2 years ago • 7 comments

FedScale is currently built under an assumption that the central server communicates with clients based on push mechanism where the server initiates signals (handshake/training/notTraining/etc) to available devices. Is it possible to consider a pull based system where the device initiates the communication by sending requests to the server to ask for next step actions on a periodic basis? In a realistic setting, the device would periodically ping the server when its training criteria is met (enough data, sufficient battery, app open, etc.), and the server would respond with the model gradients for federated training if the client is selected.

ewenw avatar Jun 24 '22 21:06 ewenw

Hello. Thanks for trying FedScale. We want to note that FedScale now is indeed a pull-based system and is able to support real deployment. For example, clients/executors periodically ping the server for next step actions. Just like what you are describing. :)

fanlai0990 avatar Jun 24 '22 21:06 fanlai0990

Hi @fanlai0990 from my understanding, the server samples a selection of clients, then uses the client executors to push those messages to the client for signals like CLIENT_TRAIN and SHUT_DOWN. This seems like a push mechanism to me.

ewenw avatar Jun 24 '22 22:06 ewenw

There might be some misleading variable name. In FedScale, each executor drives the execution of its client. The executor ping(pull) the aggregator for the next steps, which may receive CLIENT_TRAIN for selected clients, or DUMMY_MSG with doing nothing. But essentially, the client is polling the aggregator.

fanlai0990 avatar Jun 24 '22 22:06 fanlai0990

Thanks for the clarification!

ewenw avatar Jun 24 '22 22:06 ewenw

btw, are you considering a pulled-based new device checkin system?

AmberLJC avatar Jun 24 '22 22:06 AmberLJC

@AmberLJC yes, we imagine when the device is in a state that it's ready for training and has new data to train on, it will periodically check-in with the server. Then the server decides whether the device should train based on its training history. This makes more sense in an async system.

ewenw avatar Jun 24 '22 22:06 ewenw

@AmberLJC @fanlai0990 I think this is what we were discussing earlier re:having a separate chekcin server that decouples devices checking in and selector picking them.

mosharaf avatar Jun 24 '22 23:06 mosharaf