inference Open Division Rule Clarification Request

According to https://github.com/mlcommons/inference/blob/master/Submission_Guidelines.md#expected-time-to-do-benchmark-runs

There is no constraint on the model used also except that the model must be trained on the dataset used in the corresponding MLPerf inference task.

By "the dataset used in the corresponding MLPerf inference task", does it mean the validation set used for accuracy testing in the closed division's same workload? It does not quite make sense to enforce this so that retraining of the open division model can be overfitted to the validation dataset.

If it means the original dataset used to train the closed division model, what would the submitters use if the original dataset is not publicly available? In this case this rule needs to be relaxed, such as the submitters can choose their own training dataset except the validation dataset and have to publish the dataset they used for training as a complementary material for open division submission.

Nov 01 '23 22:11 nvyihengz

@mrmhodak We need to discuss this ASAP in WGM.

Nov 13 '23 21:11 nv-ananjappa

except that the model must be trained

I think "trained" should be replaced with "validated".

such as the submitters can choose their own training dataset except the validation dataset and have to publish the dataset they used for training as a complementary material for open division submission.

Requiring to publish the training dataset may be a step too far?

Nov 13 '23 22:11 psyhtest