pai icon indicating copy to clipboard operation
pai copied to clipboard

New RestServer Architecture: RestServer -> DB -> ApiServer

Open yqwang-ms opened this issue 4 years ago • 1 comments

By leveraging DB, RestServer can be

  1. RAW Consistency
  2. High Perf and Powerful Query: List, Paging, Sorting, Summarizing, etc
  3. Larger storage quota and duration
  4. Active and history jobs are unified and merged together naturally (no need to introduce UID) (https://github.com/microsoft/pai/issues/3935)
  5. Job Name can submit idempotently, attach metadata arbitrarily, and query uniquely (https://github.com/microsoft/pai/issues/3935)
  6. etc

Features depend on it List History Jobs: https://github.com/microsoft/pai/issues/3845, https://github.com/microsoft/pai/issues/4610, https://github.com/microsoft/pai/issues/3935 Expose K8s events: part of enrich job debugging info: https://github.com/microsoft/pai/issues/4649

New RestServer Architecture In short, compared with current architecture, we insert a DB between RestServer and ApiServer. image

Sub Tasks

  • [ ] Database Controller (StatefulSet with initializer, write merger, watcher, poller, gcer)
    • [x] P0 Database schema
    • [x] P0 Database ORM interface in Node.js
    • [x] P0 Service setup (sevice.yaml, start.sh, config, ... , etc)
    • [ ] Initializer (handle schema sync, upgrade, legacy, etc...)
      • [x] P0 Version table setup
      • [ ] P1 Legacy framework transfer @suiguoxin
    • [x] P0 Write merger
    • [x] P0.5 DB poller
    • [ ] API watcher
      • [x] P0 Watcher for framework
      • [ ] P2 Watcher for event
      • [ ] P2 Watcher for pod
    • [ ] #5653
  • [ ] Rest API Change
    • [x] P0 POST /api/v2/jobs
    • [x] P0 GET /api/v2/jobs
    • [x] P0 GET /api/v2/jobs/{user}~{job}
    • [x] P0 GET /api/v2/jobs/{user}~{job}/config
    • [x] P0 PUT /api/v2/jobs/{user}~{job}/executionType
    • [x] P0 Swagger update
    • [x] P1 /api/v2/jobs/{user}~{job}/job-attempts @suiguoxin #4716
    • [x] P1 /api/v2/jobs/{user}~{job}/job-attempts/{attemptIndex} @suiguoxin
  • [ ] Webportal
    • [x] P1 paging on list job page
  • [ ] Fluentd @suiguoxin #4716
    • [x] P1 Fluentd plugin multithread
    • [x] P1 Fluentd change for new schema (for framework history)
    • [x] P2 Fluentd for Pod
  • [ ] Other
    • [x] P1 Framework controller Decrease GC time?
    • [ ] P2 Make postgresql a StatefulSet

yqwang-ms avatar Jun 28 '20 03:06 yqwang-ms

relate to https://github.com/microsoft/pai/issues/4600

fanyangCS avatar Jul 09 '20 03:07 fanyangCS