bert
bert copied to clipboard
how use the pretrain checkpoint to continue train on my own corpus?
I want to load the pretrain checkpoint to continue train on my own corpus, I use the run_pretrain.py
code and set the init_checkpoint to the pretrain dir, while I run the code, it raise error?
ERROR:tensorflow:Error recorded from training_loop: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
From /job:worker/replica:0/task:0:
Key bert/embeddings/LayerNorm/beta/adam_m not found in checkpoint
[[node save/RestoreV2 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
I know that when finish training, it is better to remove adam_m
and adam_v
parameter to reduce the size of the checkpoint file, but I while want to continue train on the pretrain checkpoint, how to sovle this problem, may be I can recovert adam reference variable name in the checkpoint file ?thank you
I want to load the pretrain checkpoint to continue train on my own corpus, I use the
run_pretrain.py
code and set the init_checkpoint to the pretrain dir, while I run the code, it raise error?ERROR:tensorflow:Error recorded from training_loop: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error: From /job:worker/replica:0/task:0: Key bert/embeddings/LayerNorm/beta/adam_m not found in checkpoint [[node save/RestoreV2 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
I know that when finish training, it is better to remove
adam_m
andadam_v
parameter to reduce the size of the checkpoint file, but I while want to continue train on the pretrain checkpoint, how to sovle this problem, may be I can recovert adam reference variable name in the checkpoint file ?thank you
I ran into a similar issue while trying to load a model trained under Tensorflow 1.x to work under the upgraded version under Tensorflow 2.0. If you have solved the issue, please share your approach.
@ibrahimishag tensorflow 2.0 variable name is different with the tensorflow 1.x, you may reference here.
Hi all! I'm trying to initiate from mBERT checkpoint but it is missing the "bert/embeddings/LayerNorm/beta/adam_m" key in the list of variables (just like you described). I'm using TF=1.14 and have not found a solution in the conversion of checkpoint in TF >2. Did you find a solution?
Hi @RyanHuangNLP, if you have found a solution for this problem, would you mind sharing it? :)
Hi, I am also facing the same issue : While trying to train from the mbert checkpoint : Key bert/embeddings/LayerNorm/beta/adam_m not found in checkpoint While trying to predict from the mbert checkpoint : Key global_step not found in checkpoint @RyanHuangNLP Did you find a solution for the same? Thanks in advance!
Kindly share the solution if someone knows. Thanks
Did anyone figure out a solution for this? I'm facing the same problem. Kindly share if someone does know.