training_policies icon indicating copy to clipboard operation
training_policies copied to clipboard

Work Estimate / Blockers for References (gradient accumulation and convergence understanding)

Open bitfort opened this issue 5 years ago • 4 comments

Reference owners please update yellow cells with a work estimate (in weeks/days) or a blocking issue that needs to be resolved. https://docs.google.com/spreadsheets/d/1W8L8SBIrgbJ_f_-2hUt8SqLNkzAvsKNkQ0A6pKWz9_8/edit#gid=0

bitfort avatar Sep 10 '20 15:09 bitfort

SWG:

Will request reference owners add status updates next week to the spreadsheet.

bitfort avatar Nov 05 '20 17:11 bitfort

We had requested a status update in this spreadsheet: https://docs.google.com/spreadsheets/d/1W8L8SBIrgbJ_f_-2hUt8SqLNkzAvsKNkQ0A6pKWz9_8/edit#gid=0

We will touch base next week:

  1. Does it need gradient accumulation?
  2. Status on adding gradient accumulation?
  3. Convergence Curve - https://drive.google.com/drive/u/0/folders/1sDmlkLyehFcQWEEW8IhQUbLafaPhTE-9

Convergence Curves: Run 2x the required runs for submission spread across the historically min submitted batch size and max submitted batch sizes -- running a powers of 2 start at min going to max.

bitfort avatar Nov 19 '20 17:11 bitfort

In addition to gradient accumulation and convergence curves, we also need to update logging to the latest v0.7 (or v1.0?) spec. I've added a column in the status spreadsheet for this.

johntran-nv avatar Jan 04 '21 19:01 johntran-nv

From this week's meeting:

  • Tracking spreadsheet: https://docs.google.com/spreadsheets/d/1W8L8SBIrgbJ_f_-2hUt8SqLNkzAvsKNkQ0A6pKWz9_8/edit?usp=sharing
  • Reminded group that we plan to freeze on 1/22
  • Review existing pull requests https://github.com/mlcommons/training/pulls, all are to be resolved in the next 2 weeks
  • If you lack permissions to contribute, please update
  • We should make a label to indicate which PRs are going to impact the references
  • Follow up over email on status of Minigo
  • In progress work for every reference.
  • New action item for all reference owners to fix logging, not needed by freeze but we want shortly after. [AI JohnT] Email to be sent to owners. (sent on 1/7/21)
  • Want convergence curves by freeze deadline. Let others know if you need help with this.

johntran-nv avatar Jan 08 '21 23:01 johntran-nv