Preprocessed data are in the wrong path 512/wikipedia_pretrain.
BERT_pretrain.ipynb instructs to download https://bertonazuremlwestus2.blob.core.windows.net/public/bert_data.tar.gz for the preprocessed data. The tar file contains data in 512/wikipedia_pretrain, but it should be 512/wiki_pretrain.
The serialized data wikipedia_segment ed_part_NN.bin refer WikiNBookCorpusPretrainingDataCreator which has been deleted in the latest code. Adding the following can avoid the issue.
class WikiNBookCorpusPretrainingDataCreator(PretrainingDataCreator):
pass
@kaiidams thanks for reporting this issue. We will update the tar file soon. In the meantime, download and use the data referenced in https://github.com/microsoft/AzureML-BERT/blob/master/docs/artifacts.md#preprocessed-data and you will not need the deleted file for loading the data.