training [BERT] Evaluating the pre-trained google research model to get the MLM accuracy

Hello all,

We are curious on what is the MLM accuracy of our eval-set run on the pre-trained model that google-research provided. Specifically, the bert-large-uncased model. However, when trying to execute the run_pretraining.py script to evaluate the model, we encounter the following error:

tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key global_step not found in checkpoint
         [[node save/RestoreV2 (defined at /home/.virtualenvs/ai/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

It seems that the downloaded google-research model does not have a "global_step" Key and so we're unable to load the model to predict the MLM-accuracy of it.

Script used to evaluate the model :

TF_XLA_FLAGS='--tf_xla_auto_jit=2' \
python3 run_pretraining.py \
  --bert_config_file=/path/to/bert_config.json  \
  --output_dir=/path/to/wwm_uncased_L-24_H-1024_A-16/ \
  --input_file=/path/to/eval_10k \
  --do_eval \
  --nodo_train \
  --eval_batch_size=8 \
  --init_checkpoint=/path/to/tf1_ckpt/model.ckpt-28252.index \
  --iterations_per_loop=1000 \
  --learning_rate=0.0001 \
  --max_eval_steps=1250 \
  --max_predictions_per_seq=76 \
  --max_seq_length=512 \
  --num_gpus=1 \
  --num_train_steps=107538 \
  --num_warmup_steps=1562 \
  --optimizer=lamb \
  --save_checkpoints_steps=1562 \
  --start_warmup_step=0 \
  --train_batch_size=24 \
  --nouse_tpu

BERT-Large-Uncased model provided by google research : BERT-Large, Uncased (Whole Word Masking) in their github-repo

Did anyone encounter a similar issue? If there is a solution for this, kindly share.

Thank you.

Nov 10 '21 03:11 nikhildurgam95

@sgpyc any idea what this is about?

Nov 16 '22 19:11 johntran-nv

The MLPerf reference BERT model is in fact slightly modified from the Google Research model. As far as I remembered, the math should be the same; input dataset is different.

I need to see whether can skip the step counter in checkpoint loading.

Dec 01 '22 17:12 sgpyc