training_policies Benchmark decision making process

Some principles are needed:

how many benchmarks we can support?
roughly how should we divide those benchmarks into categories to cover the ML space
How cutting edge should the benchmark be? a. reflect the research b. reflect the practical industry
What is the timeline for benchmark revolution?
a. what is the cadence for changing benchmarks? One year or 6 months? b. what is the timeline for selecting candidates? For making a choice? For having a reference ready relative to submission?
How do we make implementation choices?
What is the process to make a decision? Voting by consensus?
How do we align with inference?
How do we align with customers?

Paulius has a similar draft that we can discuss.

Jul 11 '19 18:07 frank-wei

I’d also add - what should we do with old benchmarks?

Given the effort, it would seem like a good idea for us to at least keep the old ones around but deprecated.

David

On Thu, Jul 11, 2019 at 8:41 PM Frank Wei [email protected] wrote:

Some principles are needed:

how many benchmarks we can support? 2.

roughly how should we divide those benchmarks into categories to cover the ML space 3.

How cutting edge should the benchmark be? a. reflect the research b. reflect the practical industry 4.

What is the timeline for benchmark revolution? a. what is the cadence for changing benchmarks? One year or 6 months? b. what is the timeline for selecting candidates? For making a choice? For having a reference ready relative to submission? 5.

How do we make implementation choices? 6.

What is the process to make a decision? Voting by consensus? 7.

How do we align with inference? 8.

How do we align with customers?

Paulius has a similar draft that we can discuss.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mlperf/training_policies/issues/239?email_source=notifications&email_token=AJXLOK7ENE3UK6W6YBLKLWTP655ENA5CNFSM4IBQ42XKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G6WSRZQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AJXLOK4Z2MTEIM72L4OBLVTP655ENANCNFSM4IBQ42XA .

Jul 11 '19 18:07 TheKanter

I wonder if we can cover below too:

a regulated process to introduce a new benchmark, reflecting market needs, and competitors' votes.
how to align with inference --- and how to associate training models with inference models, say, by model compression, which compression techniques are allowed for benchmarks (if beyond quantization), etc.

Jul 11 '19 18:07 aiminsang