FedScale icon indicating copy to clipboard operation
FedScale copied to clipboard

Support k8s for job submission and management

Open IKACE opened this issue 2 years ago • 0 comments

Why are these changes needed?

Support using k8s to manage job lifecycles, including job submission, initialization, termination and clean-up.

TODO:

  • [x] add README for k8s job management tutorial
  1. change in docker/driver.py is added to use k8s client apis for job management, now the driver will support "default", "docker" and "k8s" modes.
  2. add a yaml generator for automating generation of k8s container configs.
  3. add new example k8s configs in benchmark

Related issue number

Checks

  • [x] I've included any doc changes needed for https://fedscale.readthedocs.io/en/latest/
  • [x] I've made sure the following tests are passing.
  • Testing Configurations
    • k8s
      • [x] Dry Run (20 training rounds & 1 evaluation round)
      • [x] Cifar 10 (20 training rounds & 1 evaluation round)
      • [x] Femnist (20 training rounds & 1 evaluation round)
    • Regression 1: docker
      • [x] Cifar 10 (20 training rounds & 1 evaluation round)
      • [x] Femnist (20 training rounds & 1 evaluation round)
    • Regression 2: default
      • [x] Cifar 10 (20 training rounds & 1 evaluation round)
      • [x] Femnist (20 training rounds & 1 evaluation round)

IKACE avatar Oct 07 '22 05:10 IKACE