FedScale
FedScale copied to clipboard
Support k8s for job submission and management
Why are these changes needed?
Support using k8s to manage job lifecycles, including job submission, initialization, termination and clean-up.
TODO:
- [x] add README for k8s job management tutorial
- change in
docker/driver.py
is added to use k8s client apis for job management, now the driver will support "default", "docker" and "k8s" modes. - add a yaml generator for automating generation of k8s container configs.
- add new example k8s configs in benchmark
Related issue number
Checks
- [x] I've included any doc changes needed for https://fedscale.readthedocs.io/en/latest/
- [x] I've made sure the following tests are passing.
- Testing Configurations
- k8s
- [x] Dry Run (20 training rounds & 1 evaluation round)
- [x] Cifar 10 (20 training rounds & 1 evaluation round)
- [x] Femnist (20 training rounds & 1 evaluation round)
- Regression 1: docker
- [x] Cifar 10 (20 training rounds & 1 evaluation round)
- [x] Femnist (20 training rounds & 1 evaluation round)
- Regression 2: default
- [x] Cifar 10 (20 training rounds & 1 evaluation round)
- [x] Femnist (20 training rounds & 1 evaluation round)
- k8s