jonathan-cohen-nvidia
jonathan-cohen-nvidia
Clarify that "available" includes systems which are supported but no longer available for purchase
Some submissions this round will likely be on NVIDIA systems that are supported but which cannot be purchased directly anymore. To be consistent with the intention of "available" we think...
We discussed this in the past (https://github.com/mlperf/training_policies/issues/250) but I'm not clear we ever actually came up with a process for assigning reviewers. We should discuss what this will be.
We believe the Transformer reference implementation points to an incorrect eval dataset here: https://github.com/mlperf/training/tree/master/translation#quality-metric This eval data set has 2737 lines. In 0.6, both NVIDIA & Google, used the eval...
There was a rule previously that you could choose to do either: Process the whole batch at a time and generate the total required number of patches (1000 * batches...
General principle is that if you can, you should use pretrained weights provided by MLPerf (which should be hosted & indicated in the readme). If you are unable to do...
Flagging items for discussion in the rules doc. Probably best for post 0.7 discussion. https://github.com/mlperf/policies/pull/28
(moved from https://github.com/mlperf/training/issues/373) The kernel that copies data to the GPU embedding tables is effectively a glorified memcpy. However, depending on the layout, the hardware, the encoding, etc - it...
For 0.7 to mitigate the schedule compression (from 4 months to 3) and the variation in different submitter's QA burden/cycle time, proposal is to allow for a submission to be...
For each submitted HP, submitter should include logs from reference code run to demonstrate that convergence matches the reference. Without this, the onus is on the reviewers to do it...
We don't know how to actually achieve this, but archiving for the purpose of generating a committee discussion. We would prefer a process where hyper parameter selections are shared prior...