[BERT] Evaluating the pre-trained google research model to get the MLM accuracy
Hello all,
We are curious on what is the MLM accuracy of our eval-set run on the pre-trained model that google-research provided. Specifically, the bert-large-uncased model. However, when trying to execute the run_pretraining.py script to evaluate the model, we encounter the following error:
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key global_step not found in checkpoint
[[node save/RestoreV2 (defined at /home/.virtualenvs/ai/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
It seems that the downloaded google-research model does not have a "global_step" Key and so we're unable to load the model to predict the MLM-accuracy of it.
Script used to evaluate the model :
TF_XLA_FLAGS='--tf_xla_auto_jit=2' \
python3 run_pretraining.py \
--bert_config_file=/path/to/bert_config.json \
--output_dir=/path/to/wwm_uncased_L-24_H-1024_A-16/ \
--input_file=/path/to/eval_10k \
--do_eval \
--nodo_train \
--eval_batch_size=8 \
--init_checkpoint=/path/to/tf1_ckpt/model.ckpt-28252.index \
--iterations_per_loop=1000 \
--learning_rate=0.0001 \
--max_eval_steps=1250 \
--max_predictions_per_seq=76 \
--max_seq_length=512 \
--num_gpus=1 \
--num_train_steps=107538 \
--num_warmup_steps=1562 \
--optimizer=lamb \
--save_checkpoints_steps=1562 \
--start_warmup_step=0 \
--train_batch_size=24 \
--nouse_tpu
BERT-Large-Uncased model provided by google research : BERT-Large, Uncased (Whole Word Masking) in their github-repo
Did anyone encounter a similar issue? If there is a solution for this, kindly share.
Thank you.
@sgpyc any idea what this is about?
The MLPerf reference BERT model is in fact slightly modified from the Google Research model. As far as I remembered, the math should be the same; input dataset is different.
I need to see whether can skip the step counter in checkpoint loading.