training_results_v0.7 icon indicating copy to clipboard operation
training_results_v0.7 copied to clipboard

Getting run-time from NVIDIA gnmt and transformer logs

Open nileshnegi opened this issue 3 years ago • 0 comments

Trying to use the end-of-file RESULT statements in logs on training_results_v0.7/NVIDIA/results/dgxa100_ngc20.06_pytorch/gnmt/ and training_results_v0.7/NVIDIA/results/dgxa100_ngc20.06_pytorch/transformer/.

For gnmt:

$ for i in `ls NVIDIA/results/dgxa100_ngc20.06_pytorch/gnmt/result_*` ; do grep -m1 "^RESULT" $i ; done
RESULT,RNN_TRANSLATOR,,618,nvidia,2020-06-17 07:13:26 PM
RESULT,RNN_TRANSLATOR,,499,nvidia,2020-06-17 07:13:26 PM
RESULT,RNN_TRANSLATOR,,500,nvidia,2020-06-17 07:13:27 PM
RESULT,RNN_TRANSLATOR,,501,nvidia,2020-06-17 07:13:23 PM
RESULT,RNN_TRANSLATOR,,500,nvidia,2020-06-17 07:13:28 PM
RESULT,RNN_TRANSLATOR,,500,nvidia,2020-06-17 07:13:29 PM
RESULT,RNN_TRANSLATOR,,500,nvidia,2020-06-17 09:16:36 PM
RESULT,RNN_TRANSLATOR,,502,nvidia,2020-06-17 09:16:37 PM
RESULT,RNN_TRANSLATOR,,502,nvidia,2020-06-17 09:16:39 PM
RESULT,RNN_TRANSLATOR,,501,nvidia,2020-06-17 09:17:25 PM

Here, the average, after ignoring the fastest and slowest run-times, is 8.35 minutes.

(500+501+500+500+500+502+502+501)/(8*60)
8.34583333333333333333

Similarly for transformer:

for i in `ls NVIDIA/results/dgxa100_ngc20.06_pytorch/transformer/result_*` ; do grep -m1 "^RESULT" $i ; done
RESULT,transformer,22836,505,root,2020-06-23 02:24:41 PM
RESULT,transformer,24009,502,root,2020-06-23 02:24:40 PM
RESULT,transformer,2723,504,root,2020-06-23 02:24:38 PM
RESULT,transformer,26020,502,root,2020-06-23 02:24:39 PM
RESULT,transformer,22438,504,root,2020-06-23 02:24:39 PM
RESULT,transformer,4462,502,root,2020-06-23 02:24:39 PM
RESULT,transformer,16552,657,root,2020-06-23 02:24:39 PM
RESULT,transformer,21684,503,root,2020-06-23 02:24:46 PM
RESULT,transformer,29290,502,root,2020-06-23 02:24:39 PM
RESULT,transformer,13023,502,root,2020-06-23 02:24:39 PM

Here, the average, after ignoring the fastest and slowest run-times, is 8.38 minutes.

(505+502+504+502+504+502+503+502)/(8*60)
8.38333333333333333333

But the timings reported on MLCommons results page shows 7.81 minutes for gnmt and 7.84 minutes for transformer.

Is there a different way of calculating run-times from these logfiles?

nileshnegi avatar May 13 '21 05:05 nileshnegi