training_results_v0.6
training_results_v0.6 copied to clipboard
NVIDIA v0.6 transformer implementation benchmark data download
This is regarding training_results_v0.6/NVIDIA/benchmarks/transformer/implementations/pytorch/
I'm facing many errors when attempting to run bash run_preprocessing.sh && bash run_conversion.sh
. I am running the scripts inside of the NGC container. The first error I face is that the urls no longer work in the lines
wget https://raw.githubusercontent.com/tensorflow/models/master/official/transformer/test_data/newstest2014.en -O /workspace/translation/examples/translation/wmt14_en_de/newstest2014.en
wget https://raw.githubusercontent.com/tensorflow/models/master/official/transformer/test_data/newstest2014.de -O /workspace/translation/examples/translation/wmt14_en_de/newstest2014.de
I replaced them with cp newstest2014.* /workspace/translation/examples/translation/wmt14_en_de/
as the files are already in the pytorch directory.
The next error is an import error from:
from mlperf_log_utils import mlperf_print, mlperf_submission_log, set_seeds, get_rank
.
set_seeds is not defined in mlperf_log_utils.py. I simply removed the import of set_seeds as it is not used in preprocess.py anyway. I did the same thing in preprocess_fairseq.py.
After that one more error remained from the line
mlperf_log.ROOT_DIR_TRANSFORMER = os.path.dirname(os.path.realpath(__file__))
NameError: name 'mlperf_log' is not defined
Importing mlperf_log_utils and replacing mlperf_log with mlperf_log_util allows the preprocessing to run. After the scripts ran I was able to run the benchmark with DATADIR set to examples/translation/wmt14_en_de/utf8/, however doing these hacks to get it running makes me think I must be doing something wrong.
Are there further instructions I'm missing about getting the data for this benchmark? I'm hoping to reproduce the published results on a DGX1.