albert_zh icon indicating copy to clipboard operation
albert_zh copied to clipboard

bert/embeddings/LayerNorm/beta shape不匹配

Open xjx0524 opened this issue 5 years ago • 10 comments

ValueError: Shape of variable bert/embeddings/LayerNorm/beta:0 ((128,)) doesn't match with shape of tensor bert/embeddings/LayerNorm/beta ([768]) from checkpoint reader. 我想对 albert_base 的模型进行预训练,用的是run_pretraining_google这个文件,但提示了上面的错误,请问是什么问题呢?

xjx0524 avatar Apr 14 '20 03:04 xjx0524

配置文件没有对吧。 你安装【预训练示例】章节的示例来操作

brightmart avatar Apr 14 '20 06:04 brightmart

配置文件没有对吧。 你安装【预训练示例】章节的示例来操作

我最后是把 init_checkpoint 改成 albert 官方项目提供的那个中文预训练模型就可以了,不知道您提供的版本和官方的有哪些差异呢?

xjx0524 avatar Apr 14 '20 06:04 xjx0524

ValueError: Shape of variable bert/embeddings/LayerNorm/beta:0 ((312,)) doesn't match with shape of tensor bert/embeddings/LayerNorm/beta ([2048]) from checkpoint reader.

sdd031215 avatar Jun 09 '20 09:06 sdd031215

我也遇到了这种问题,使用export BERT_BASE_DIR=./albert_model export TEXT_DIR=./data nohup python3 run_classifier.py --task_name=lcqmc_pair --do_train=true --do_eval=true --data_dir=$TEXT_DIR --vocab_file=./albert_config/vocab.txt
--bert_config_file=./albert_config/albert_config_tiny.json --max_seq_length=15 --train_batch_size=64 --learning_rate=1e-4 --num_train_epochs=5
--output_dir=./albert_lcqmc_checkpoints --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt &

sdd031215 avatar Jun 09 '20 09:06 sdd031215

我的问题是bertconfigfile错了

sdd031215 avatar Jun 09 '20 09:06 sdd031215

配置文件没有对吧。 你安装【预训练示例】章节的示例来操作

我最后是把 init_checkpoint 改成 albert 官方项目提供的那个中文预训练模型就可以了,不知道您提供的版本和官方的有哪些差异呢?

我用google的albert,加载albert_zh上下载的模型会遇到同样的报错,加载albert上的zh模型是正常的。

thomaszheng avatar Jun 22 '20 07:06 thomaszheng

与google的模型文件对比可以发现它们的参数与模型结构有些许差别,比如: albert_zh里的albert_large_zh模型参数: {'bert/embeddings/word_embeddings': [21128, 128], 'bert/embeddings/word_embeddings_2': [128, 1024], 'bert/embeddings/token_type_embeddings': [2, 1024], 'bert/embeddings/position_embeddings': [512, 1024], 'bert/embeddings/LayerNorm/beta': [1024], 'bert/embeddings/LayerNorm/gamma': [1024], 'bert/encoder/layer_shared/attention/self/query/kernel': [1024, 1024], 'bert/encoder/layer_shared/attention/self/query/bias': [1024], 'bert/encoder/layer_shared/attention/self/key/kernel': [1024, 1024], 'bert/encoder/layer_shared/attention/self/key/bias': [1024], 'bert/encoder/layer_shared/attention/self/value/kernel': [1024, 1024], 'bert/encoder/layer_shared/attention/self/value/bias': [1024], 'bert/encoder/layer_shared/attention/output/dense/kernel': [1024, 1024], 'bert/encoder/layer_shared/attention/output/dense/bias': [1024], 'bert/encoder/layer_shared/attention/output/LayerNorm/gamma': [1024], 'bert/encoder/layer_shared/attention/output/LayerNorm/beta': [1024], 'bert/encoder/layer_shared/intermediate/dense/kernel': [1024, 4096], 'bert/encoder/layer_shared/intermediate/dense/bias': [4096], 'bert/encoder/layer_shared/output/dense/kernel': [4096, 1024], 'bert/encoder/layer_shared/output/dense/bias': [1024], 'bert/encoder/layer_shared/output/LayerNorm/beta': [1024], 'bert/encoder/layer_shared/output/LayerNorm/gamma': [1024], 'bert/pooler/dense/kernel': [1024, 1024], 'bert/pooler/dense/bias': [1024], 'cls/predictions/transform/dense/kernel': [1024, 1024], 'cls/predictions/transform/dense/bias': [1024], 'cls/predictions/transform/LayerNorm/beta': [1024], 'cls/predictions/transform/LayerNorm/gamma': [1024], 'cls/predictions/output_bias': [21128], 'cls/seq_relationship/output_weights': [2, 1024], 'cls/seq_relationship/output_bias': [2] }

而google的albert_large_zh模型参数: {'bert/embeddings/word_embeddings': [21128, 128], 'bert/embeddings/token_type_embeddings': [2, 128], 'bert/embeddings/position_embeddings': [512, 128], 'bert/embeddings/LayerNorm/beta': [128], 'bert/embeddings/LayerNorm/gamma': [128], 'bert/encoder/embedding_hidden_mapping_in/kernel': [128, 1024], 'bert/encoder/embedding_hidden_mapping_in/bias': [1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel': [1024, 1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias': [1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel': [1024, 1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias': [1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel': [1024, 1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias': [1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel': [1024, 1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias': [1024], 'bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta': [1024], 'bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma': [1024], 'bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel': [1024, 4096], 'bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias': [4096], 'bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel': [4096, 1024], 'bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias': [1024], 'bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta': [1024], 'bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma': [1024], 'bert/pooler/dense/kernel': [1024, 1024], 'bert/pooler/dense/bias': [1024], 'cls/predictions/transform/dense/kernel': [1024, 128], 'cls/predictions/transform/dense/bias': [128], 'cls/predictions/transform/LayerNorm/beta': [128], 'cls/predictions/transform/LayerNorm/gamma': [128], 'cls/predictions/output_bias': [21128], 'cls/seq_relationship/output_weights': [2, 1024], 'cls/seq_relationship/output_bias': [2], 'global_step': []}

两者将word embedding从128维扩展到1024维的处理位置不同,layer normalization的位置也不同。另外很多变量名也不一样。

因此,用google的albert代码加载albert_zh的模型是不行的。

weizjiang avatar Aug 04 '20 02:08 weizjiang

遇到了同样的问题,有解决方案吗

zhangweijiqn avatar Apr 14 '21 13:04 zhangweijiqn

ValueError: Shape of variable bert/embeddings/word_embeddings:0 ((21128, 312)) doesn't match with shape of tensor bert/embeddings/word_embeddings ([21128, 128]) from checkpoint reader.

我的解决方法是: from bert.modeling import get_assignment_map_from_checkpoint,BertConfig, BertModel 改为 from modeling import get_assignment_map_from_checkpoint,BertConfig, BertModel

还要有tf.train.init_from_checkpoint()和sess.run(tf.global_variables_initializer())

hjing100 avatar Aug 25 '21 03:08 hjing100

ValueError: Shape of variable bert/embeddings/word_embeddings:0 ((21128, 312)) doesn't match with shape of tensor bert/embeddings/word_embeddings ([21128, 128]) from checkpoint reader.

我的解决方法是: from bert.modeling import get_assignment_map_from_checkpoint,BertConfig, BertModel 改为 from modeling import get_assignment_map_from_checkpoint,BertConfig, BertModel

还要有tf.train.init_from_checkpoint()和sess.run(tf.global_variables_initializer())

”还要有tf.train.init_from_checkpoint()和sess.run(tf.global_variables_initializer())“请问这个在哪里改呀

EricKani avatar Oct 26 '22 08:10 EricKani