training_results_v0.7
training_results_v0.7 copied to clipboard
Getting run-time from NVIDIA gnmt and transformer logs
Trying to use the end-of-file RESULT statements in logs on training_results_v0.7/NVIDIA/results/dgxa100_ngc20.06_pytorch/gnmt/ and training_results_v0.7/NVIDIA/results/dgxa100_ngc20.06_pytorch/transformer/.
For gnmt:
$ for i in `ls NVIDIA/results/dgxa100_ngc20.06_pytorch/gnmt/result_*` ; do grep -m1 "^RESULT" $i ; done
RESULT,RNN_TRANSLATOR,,618,nvidia,2020-06-17 07:13:26 PM
RESULT,RNN_TRANSLATOR,,499,nvidia,2020-06-17 07:13:26 PM
RESULT,RNN_TRANSLATOR,,500,nvidia,2020-06-17 07:13:27 PM
RESULT,RNN_TRANSLATOR,,501,nvidia,2020-06-17 07:13:23 PM
RESULT,RNN_TRANSLATOR,,500,nvidia,2020-06-17 07:13:28 PM
RESULT,RNN_TRANSLATOR,,500,nvidia,2020-06-17 07:13:29 PM
RESULT,RNN_TRANSLATOR,,500,nvidia,2020-06-17 09:16:36 PM
RESULT,RNN_TRANSLATOR,,502,nvidia,2020-06-17 09:16:37 PM
RESULT,RNN_TRANSLATOR,,502,nvidia,2020-06-17 09:16:39 PM
RESULT,RNN_TRANSLATOR,,501,nvidia,2020-06-17 09:17:25 PM
Here, the average, after ignoring the fastest and slowest run-times, is 8.35 minutes.
(500+501+500+500+500+502+502+501)/(8*60)
8.34583333333333333333
Similarly for transformer:
for i in `ls NVIDIA/results/dgxa100_ngc20.06_pytorch/transformer/result_*` ; do grep -m1 "^RESULT" $i ; done
RESULT,transformer,22836,505,root,2020-06-23 02:24:41 PM
RESULT,transformer,24009,502,root,2020-06-23 02:24:40 PM
RESULT,transformer,2723,504,root,2020-06-23 02:24:38 PM
RESULT,transformer,26020,502,root,2020-06-23 02:24:39 PM
RESULT,transformer,22438,504,root,2020-06-23 02:24:39 PM
RESULT,transformer,4462,502,root,2020-06-23 02:24:39 PM
RESULT,transformer,16552,657,root,2020-06-23 02:24:39 PM
RESULT,transformer,21684,503,root,2020-06-23 02:24:46 PM
RESULT,transformer,29290,502,root,2020-06-23 02:24:39 PM
RESULT,transformer,13023,502,root,2020-06-23 02:24:39 PM
Here, the average, after ignoring the fastest and slowest run-times, is 8.38 minutes.
(505+502+504+502+504+502+503+502)/(8*60)
8.38333333333333333333
But the timings reported on MLCommons results page shows 7.81 minutes for gnmt and 7.84 minutes for transformer.
Is there a different way of calculating run-times from these logfiles?