pai icon indicating copy to clipboard operation
pai copied to clipboard

2021 March Release Plan

Open scarlett2018 opened this issue 3 years ago • 0 comments

Release Manager

@Binyang2014

Endgame

Feature freeze: TBD Code freeze: 4.6 Scrum demo date: TBD Bug Bash date: 4.16 Release date & retrospective date: 4.26

Test Plan:

TBD

Top level themes (work item break down needed)

  • Marketplace v1 backlog - @debuggy / @yiyione / @TobeyQin 2 weeks P0 May need to delay one week Test Owner: @suiguoxin @debuggy @hzy46

    Marketplace Mar. Release Plan #233

  • x-plan - @Binyang2014 TBD Test Owner: @suiguoxin Done

Alert-Manager Test Owner: @Binyang2014 Test done

  • [ ] send alert to user when job failed #5337 @suiguoxin P1 (defer)
  • [x] Add alert & auto-fix for GPU perf issue #5342 #5383 P0 Test cases: https://github.com/microsoft/pai/pull/5383#issuecomment-812286658
  • [x] Add kill-long-running-job email templates #5384 Test cases: https://github.com/microsoft/pai/pull/5384#issuecomment-812290954

Rest Server Test Owner: @yiyione Done

  • [x] support sort by completionTime in get the list jobs API #5347 @suiguoxin
    • [x] API change #5375 P0 test cases: https://github.com/microsoft/pai/pull/5375#issuecomment-812292773
    • [x] application of this API in cluster-utlization #5376 P0

Deployment Test Owner: @Binyang2014

  • [x] add / remove nodes with layout.yaml #5321 #5167 @Starmys P0 Test Done
  • [x] webportal package build issue #5378 @suiguoxin P0 Test cases: https://github.com/microsoft/pai/pull/5378#issue-592689628
  • [x] K8s API server's cert need renew each year #5334 P0 Test cases: https://github.com/microsoft/pai/issues/5334#issuecomment-815412607

Documents

  • [x] Doc for renew API server doc @yiyione
  • [x] Document for config.yaml @Starmys
  • [x] Document for new submission page, user manual @debuggy
  • [x] Add remove nodes
  • [x] Doc Nvidia driver version mismatch

Other backlogs

Use case & best practice

  • Use case and best practice summary - @hzy46 / @TobeyQin / @suiguoxin
    • OpenPAI Advantage
    • OpenPAI Best Practice -- P0 topics:
      • Cluster setup and onboarding
      • Utilization weekly report
      • Storage
      • How to debug
      • Leverage low-priority resources
      • AutoML
  • ~Job profiling @hzy46 P1~ Not in PAI release
  • HiveD user experience support - tbd @yangou1988 VC view page(design review in this release) P1
  • HiveD convert old test cases and propose new test cases @hzy46 P1
  • Dataset: integrate data prerequisite into marketplace and job submission page @hzy46 #5345 TBD P1

scarlett2018 avatar Mar 04 '21 07:03 scarlett2018