logging icon indicating copy to clipboard operation
logging copied to clipboard

Add hparams and compliance checks for training and eval samples for all benchmarks

Open emizan76 opened this issue 3 years ago • 3 comments

According to issue https://github.com/mlcommons/submission_training_1.0/issues/39, the number of training samples is 117266. Many submissions do hardcode this value, even though the reference does not. On that specific issue the submission in question used a different value.

The decision was to add train_samples, and eval_samples as hyperparameters + the related compliance checker rules so we avoid such issues in the future.

emizan76 avatar Jun 14 '21 18:06 emizan76

I am assuming we make the check on train_samples & eval_samples to match reference values for all benchmarks, as we noted this for RN50 in https://github.com/mlcommons/submission_training_1.0/issues/48 as well.

If you agree, lets edit the title of the issue.

nv-rborkar avatar Jun 16 '21 17:06 nv-rborkar

Good point. Marek, since it is all the benchmarks, if you need any help let me know.

So, restating the problem: In the 1.0 submission training and eval samples were found to be off for a couple of submissions. This happened also in 0.7 and went undetected.

Let's add compliance checks to avoid such issues in the future.

emizan76 avatar Jun 16 '21 18:06 emizan76

I'll be addressing this issue, and this is scheduled to be put in for v2.0.

shangw-nvidia avatar Nov 10 '21 18:11 shangw-nvidia