jonathan-cohen-nvidia

Results 12 issues of jonathan-cohen-nvidia

Some submissions this round will likely be on NVIDIA systems that are supported but which cannot be purchased directly anymore. To be consistent with the intention of "available" we think...

Backlog
Rec: Decision only

We discussed this in the past (https://github.com/mlperf/training_policies/issues/250) but I'm not clear we ever actually came up with a process for assigning reviewers. We should discuss what this will be.

Rec: Rules Change

We believe the Transformer reference implementation points to an incorrect eval dataset here: https://github.com/mlperf/training/tree/master/translation#quality-metric This eval data set has 2737 lines. In 0.6, both NVIDIA & Google, used the eval...

Rec: Code Change

There was a rule previously that you could choose to do either: Process the whole batch at a time and generate the total required number of patches (1000 * batches...

Rec: Decision only

General principle is that if you can, you should use pretrained weights provided by MLPerf (which should be hosted & indicated in the readme). If you are unable to do...

Backlog

Flagging items for discussion in the rules doc. Probably best for post 0.7 discussion. https://github.com/mlperf/policies/pull/28

Backlog

(moved from https://github.com/mlperf/training/issues/373) The kernel that copies data to the GPU embedding tables is effectively a glorified memcpy. However, depending on the layout, the hardware, the encoding, etc - it...

Rec: Decision only

For 0.7 to mitigate the schedule compression (from 4 months to 3) and the variation in different submitter's QA burden/cycle time, proposal is to allow for a submission to be...

Backlog

For each submitted HP, submitter should include logs from reference code run to demonstrate that convergence matches the reference. Without this, the onus is on the reviewers to do it...

Backlog

We don't know how to actually achieve this, but archiving for the purpose of generating a committee discussion. We would prefer a process where hyper parameter selections are shared prior...

Backlog