jonathan-cohen-nvidia issues

Results 12 issues of


                                            jonathan-cohen-nvidia

Clarify that "available" includes systems which are supported but no longer available for purchase

Some submissions this round will likely be on NVIDIA systems that are supported but which cannot be purchased directly anymore. To be consistent with the intention of "available" we think...

Backlog

Rec: Decision only

Ensure that all submitters have a review

We discussed this in the past (https://github.com/mlperf/training_policies/issues/250) but I'm not clear we ever actually came up with a process for assigning reviewers. We should discuss what this will be.

Rec: Rules Change

Correction to Transformer eval set

We believe the Transformer reference implementation points to an incorrect eval dataset here: https://github.com/mlperf/training/tree/master/translation#quality-metric This eval data set has 2737 lines. In 0.6, both NVIDIA & Google, used the eval...

Rec: Code Change

Mask-RCNN clarifiation around num_image_candidates

There was a rule previously that you could choose to do either: Process the whole batch at a time and generate the total required number of patches (1000 * batches...

Rec: Decision only

Clarify rules around use of pretrained weights

General principle is that if you can, you should use pretrained weights provided by MLPerf (which should be hosted & indicated in the readme). If you are unable to do...

Backlog

Discussion around what makes a submission "available"

Flagging items for discussion in the rules doc. Probably best for post 0.7 discussion. https://github.com/mlperf/policies/pull/28

Backlog

DLRM - custom embedding kernels should be allowed

(moved from https://github.com/mlperf/training/issues/373) The kernel that copies data to the GPU embedding tables is effectively a glorified memcpy. However, depending on the layout, the hardware, the encoding, etc - it...

Rec: Decision only

Allow available software to be released before publication, not submission

For 0.7 to mitigate the schedule compression (from 4 months to 3) and the variation in different submitter's QA burden/cycle time, proposal is to allow for a submission to be...

Backlog

Submitter should be responsible for showing submission does not converge better reference.

For each submitted HP, submitter should include logs from reference code run to demonstrate that convergence matches the reference. Without this, the onus is on the reviewers to do it...

Backlog

Minimize hyper parameter stealing post submission

We don't know how to actually achieve this, but archiving for the purpose of generating a committee discussion. We would prefer a process where hyper parameter selections are shared prior...

Backlog