pai
pai copied to clipboard
New RestServer Architecture: RestServer -> DB -> ApiServer
By leveraging DB, RestServer can be
- RAW Consistency
- High Perf and Powerful Query: List, Paging, Sorting, Summarizing, etc
- Larger storage quota and duration
- Active and history jobs are unified and merged together naturally (no need to introduce UID) (https://github.com/microsoft/pai/issues/3935)
- Job Name can submit idempotently, attach metadata arbitrarily, and query uniquely (https://github.com/microsoft/pai/issues/3935)
- etc
Features depend on it List History Jobs: https://github.com/microsoft/pai/issues/3845, https://github.com/microsoft/pai/issues/4610, https://github.com/microsoft/pai/issues/3935 Expose K8s events: part of enrich job debugging info: https://github.com/microsoft/pai/issues/4649
New RestServer Architecture
In short, compared with current architecture, we insert a DB between RestServer and ApiServer.
Sub Tasks
- [ ] Database Controller (StatefulSet with initializer, write merger, watcher, poller, gcer)
- [x] P0 Database schema
- [x] P0 Database ORM interface in Node.js
- [x] P0 Service setup (sevice.yaml, start.sh, config, ... , etc)
- [ ] Initializer (handle schema sync, upgrade, legacy, etc...)
- [x] P0 Version table setup
- [ ] P1 Legacy framework transfer @suiguoxin
- [x] P0 Write merger
- [x] P0.5 DB poller
- [ ] API watcher
- [x] P0 Watcher for framework
- [ ] P2 Watcher for event
- [ ] P2 Watcher for pod
- [ ] #5653
- [ ] Rest API Change
- [x] P0 POST /api/v2/jobs
- [x] P0 GET /api/v2/jobs
- [x] P0 GET /api/v2/jobs/{user}~{job}
- [x] P0 GET /api/v2/jobs/{user}~{job}/config
- [x] P0 PUT /api/v2/jobs/{user}~{job}/executionType
- [x] P0 Swagger update
- [x] P1 /api/v2/jobs/{user}~{job}/job-attempts @suiguoxin #4716
- [x] P1 /api/v2/jobs/{user}~{job}/job-attempts/{attemptIndex} @suiguoxin
- [ ] Webportal
- [x] P1 paging on list job page
- [ ] Fluentd @suiguoxin #4716
- [x] P1 Fluentd plugin multithread
- [x] P1 Fluentd change for new schema (for framework history)
- [x] P2 Fluentd for Pod
- [ ] Other
- [x] P1 Framework controller Decrease GC time?
- [ ] P2 Make postgresql a StatefulSet
relate to https://github.com/microsoft/pai/issues/4600