feat(framework) Add serverless federated learning support
feat(framework) Add serverless federated learning support
Issue
Description
This PR introduces serverless federated learning to eliminate the need for a central aggregation server. In large-scale experiments, managing multiple servers became tedious and synchronization across heterogeneous client environments was a significant bottleneck. This feature addresses those challenges by leveraging a shared storage mechanism for model weight communication.
Related issues/PRs
Implements the feature request outlined in #4273.
Proposal
Explanation
This PR adds support for serverless federated learning through the following components:
- SyncFederatedNode and AsyncFederatedNode: These new node classes manage the communication of model weights to and from a shared storage (e.g., an S3 bucket). They support both synchronous and asynchronous federation strategies.
- For pytorch, my recommendationi s to instrument the training loop using
FederatedNode.update_parameters()to apply a federated strategy to update the model weight using other nodes' weights. Seeexamples/serverless-pytorchfor an example.
The core federated learning logic continues to be driven by existing Flower strategies (e.g., FedAvg). By using a shared folder for model communication, this implementation significantly improves robustness and scalability for large-scale federated experiments.
Examples
An example is provided for pytorch (examples/serverless-pytorch), as an introductory demonstration using a partitioned CIFAR-10 dataset with artificial skew.