pai
pai copied to clipboard
2021 March Release Plan
Release Manager
@Binyang2014
Endgame
Feature freeze: TBD Code freeze: 4.6 Scrum demo date: TBD Bug Bash date: 4.16 Release date & retrospective date: 4.26
Test Plan:
TBD
Top level themes (work item break down needed)
-
Marketplace v1 backlog - @debuggy / @yiyione / @TobeyQin 2 weeks P0 May need to delay one week Test Owner: @suiguoxin @debuggy @hzy46
-
x-plan - @Binyang2014 TBD Test Owner: @suiguoxin Done
Alert-Manager Test Owner: @Binyang2014 Test done
- [ ] send alert to user when job failed #5337 @suiguoxin P1 (defer)
- [x] Add alert & auto-fix for GPU perf issue #5342 #5383 P0 Test cases: https://github.com/microsoft/pai/pull/5383#issuecomment-812286658
- [x] Add
kill-long-running-job
email templates #5384 Test cases: https://github.com/microsoft/pai/pull/5384#issuecomment-812290954
Rest Server Test Owner: @yiyione Done
- [x] support sort by completionTime in get the list jobs API #5347 @suiguoxin
- [x] API change #5375 P0 test cases: https://github.com/microsoft/pai/pull/5375#issuecomment-812292773
- [x] application of this API in cluster-utlization #5376 P0
Deployment Test Owner: @Binyang2014
- [x] add / remove nodes with
layout.yaml
#5321 #5167 @Starmys P0 Test Done - [x] webportal package build issue #5378 @suiguoxin P0 Test cases: https://github.com/microsoft/pai/pull/5378#issue-592689628
- [x] K8s API server's cert need renew each year #5334 P0 Test cases: https://github.com/microsoft/pai/issues/5334#issuecomment-815412607
Documents
- [x] Doc for renew API server doc @yiyione
- [x] Document for config.yaml @Starmys
- [x] Document for new submission page, user manual @debuggy
- [x] Add remove nodes
- [x] Doc Nvidia driver version mismatch
Other backlogs
Use case & best practice
- Use case and best practice summary - @hzy46 / @TobeyQin / @suiguoxin
- OpenPAI Advantage
- OpenPAI Best Practice -- P0 topics:
- Cluster setup and onboarding
- Utilization weekly report
- Storage
- How to debug
- Leverage low-priority resources
- AutoML
- ~Job profiling @hzy46 P1~ Not in PAI release
- HiveD user experience support - tbd @yangou1988 VC view page(design review in this release) P1
- HiveD convert old test cases and propose new test cases @hzy46 P1
- Dataset: integrate data prerequisite into marketplace and job submission page @hzy46 #5345 TBD P1