Benchmark decision making process
Some principles are needed:
-
how many benchmarks we can support?
-
roughly how should we divide those benchmarks into categories to cover the ML space
-
How cutting edge should the benchmark be? a. reflect the research b. reflect the practical industry
-
What is the timeline for benchmark revolution?
a. what is the cadence for changing benchmarks? One year or 6 months? b. what is the timeline for selecting candidates? For making a choice? For having a reference ready relative to submission? -
How do we make implementation choices?
-
What is the process to make a decision? Voting by consensus?
-
How do we align with inference?
-
How do we align with customers?
Paulius has a similar draft that we can discuss.
I’d also add - what should we do with old benchmarks?
Given the effort, it would seem like a good idea for us to at least keep the old ones around but deprecated.
David
On Thu, Jul 11, 2019 at 8:41 PM Frank Wei [email protected] wrote:
Some principles are needed:
how many benchmarks we can support? 2.
roughly how should we divide those benchmarks into categories to cover the ML space 3.
How cutting edge should the benchmark be? a. reflect the research b. reflect the practical industry 4.
What is the timeline for benchmark revolution? a. what is the cadence for changing benchmarks? One year or 6 months? b. what is the timeline for selecting candidates? For making a choice? For having a reference ready relative to submission? 5.
How do we make implementation choices? 6.
What is the process to make a decision? Voting by consensus? 7.
How do we align with inference? 8.
How do we align with customers?
Paulius has a similar draft that we can discuss.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mlperf/training_policies/issues/239?email_source=notifications&email_token=AJXLOK7ENE3UK6W6YBLKLWTP655ENA5CNFSM4IBQ42XKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G6WSRZQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AJXLOK4Z2MTEIM72L4OBLVTP655ENANCNFSM4IBQ42XA .
I wonder if we can cover below too:
- a regulated process to introduce a new benchmark, reflecting market needs, and competitors' votes.
- how to align with inference --- and how to associate training models with inference models, say, by model compression, which compression techniques are allowed for benchmarks (if beyond quantization), etc.