Ritika Borkar
Ritika Borkar
I am assuming we make the check on train_samples & eval_samples to match reference values for all benchmarks, as we noted this for RN50 in https://github.com/mlcommons/submission_training_1.0/issues/48 as well. If you...
@shangw-nvidia , @emizan76 Will the compliance checker verify this for v1.1?
@emizan76 is this something you can help with?
Thanks Elias. Can we expect this for v1.0?
@sgpyc to keep me honest. LAMB is not an allowed optimizer for RN50. Only LARS and SGD are allowed. https://github.com/mlcommons/training/tree/master/image_classification#optimizer The rules already call allow apex.optimizers.FusedSGD [here](https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#15-appendix-allowed-closed-division-optimizers)
Thanks, implementation looks good as long as nesterov momentum is not used (reference doesn't use nesterov)
We also have some RCPs which break the non-decreasing requirement with respect to increasing batch-size. This is possible if better hparams were not known at the time these RCPs were...
@davidjurado is the issue you observed resolved now?
@davidjurado can you please address Shriya's feedback. We can then merge this PR.
Discussed in Training WG (3/28): @itayhubara is verifying if setting this value correctly affect convergence & if this can improve convergence or reduce coefficienct of variance in RCPs.