logging Add hparams and compliance checks for training and eval samples for all benchmarks

Add hparams and compliance checks for training and eval samples for all benchmarks

Open emizan76 opened this issue 3 years ago • 3 comments

According to issue https://github.com/mlcommons/submission_training_1.0/issues/39, the number of training samples is 117266. Many submissions do hardcode this value, even though the reference does not. On that specific issue the submission in question used a different value.

The decision was to add train_samples, and eval_samples as hyperparameters + the related compliance checker rules so we avoid such issues in the future.

Jun 14 '21 18:06 emizan76

I am assuming we make the check on train_samples & eval_samples to match reference values for all benchmarks, as we noted this for RN50 in https://github.com/mlcommons/submission_training_1.0/issues/48 as well.

If you agree, lets edit the title of the issue.

Jun 16 '21 17:06 nv-rborkar

Good point. Marek, since it is all the benchmarks, if you need any help let me know.

So, restating the problem: In the 1.0 submission training and eval samples were found to be off for a couple of submissions. This happened also in 0.7 and went undetected.

Let's add compliance checks to avoid such issues in the future.

Jun 16 '21 18:06 emizan76

I'll be addressing this issue, and this is scheduled to be put in for v2.0.

Nov 10 '21 18:11 shangw-nvidia

logging logging copied to clipboard

Add hparams and compliance checks for training and eval samples for all benchmarks

logging
logging copied to clipboard