albert_zh
albert_zh copied to clipboard
bert/embeddings/LayerNorm/beta shape不匹配
ValueError: Shape of variable bert/embeddings/LayerNorm/beta:0 ((128,)) doesn't match with shape of tensor bert/embeddings/LayerNorm/beta ([768]) from checkpoint reader.
我想对 albert_base 的模型进行预训练,用的是run_pretraining_google这个文件,但提示了上面的错误,请问是什么问题呢?
配置文件没有对吧。 你安装【预训练示例】章节的示例来操作
配置文件没有对吧。 你安装【预训练示例】章节的示例来操作
我最后是把 init_checkpoint 改成 albert 官方项目提供的那个中文预训练模型就可以了,不知道您提供的版本和官方的有哪些差异呢?
ValueError: Shape of variable bert/embeddings/LayerNorm/beta:0 ((312,)) doesn't match with shape of tensor bert/embeddings/LayerNorm/beta ([2048]) from checkpoint reader.
我也遇到了这种问题,使用export BERT_BASE_DIR=./albert_model
export TEXT_DIR=./data
nohup python3 run_classifier.py --task_name=lcqmc_pair --do_train=true --do_eval=true --data_dir=$TEXT_DIR --vocab_file=./albert_config/vocab.txt
--bert_config_file=./albert_config/albert_config_tiny.json --max_seq_length=15 --train_batch_size=64 --learning_rate=1e-4 --num_train_epochs=5
--output_dir=./albert_lcqmc_checkpoints --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt &
我的问题是bertconfigfile错了
配置文件没有对吧。 你安装【预训练示例】章节的示例来操作
我最后是把 init_checkpoint 改成 albert 官方项目提供的那个中文预训练模型就可以了,不知道您提供的版本和官方的有哪些差异呢?
我用google的albert,加载albert_zh上下载的模型会遇到同样的报错,加载albert上的zh模型是正常的。
与google的模型文件对比可以发现它们的参数与模型结构有些许差别,比如: albert_zh里的albert_large_zh模型参数: {'bert/embeddings/word_embeddings': [21128, 128], 'bert/embeddings/word_embeddings_2': [128, 1024], 'bert/embeddings/token_type_embeddings': [2, 1024], 'bert/embeddings/position_embeddings': [512, 1024], 'bert/embeddings/LayerNorm/beta': [1024], 'bert/embeddings/LayerNorm/gamma': [1024], 'bert/encoder/layer_shared/attention/self/query/kernel': [1024, 1024], 'bert/encoder/layer_shared/attention/self/query/bias': [1024], 'bert/encoder/layer_shared/attention/self/key/kernel': [1024, 1024], 'bert/encoder/layer_shared/attention/self/key/bias': [1024], 'bert/encoder/layer_shared/attention/self/value/kernel': [1024, 1024], 'bert/encoder/layer_shared/attention/self/value/bias': [1024], 'bert/encoder/layer_shared/attention/output/dense/kernel': [1024, 1024], 'bert/encoder/layer_shared/attention/output/dense/bias': [1024], 'bert/encoder/layer_shared/attention/output/LayerNorm/gamma': [1024], 'bert/encoder/layer_shared/attention/output/LayerNorm/beta': [1024], 'bert/encoder/layer_shared/intermediate/dense/kernel': [1024, 4096], 'bert/encoder/layer_shared/intermediate/dense/bias': [4096], 'bert/encoder/layer_shared/output/dense/kernel': [4096, 1024], 'bert/encoder/layer_shared/output/dense/bias': [1024], 'bert/encoder/layer_shared/output/LayerNorm/beta': [1024], 'bert/encoder/layer_shared/output/LayerNorm/gamma': [1024], 'bert/pooler/dense/kernel': [1024, 1024], 'bert/pooler/dense/bias': [1024], 'cls/predictions/transform/dense/kernel': [1024, 1024], 'cls/predictions/transform/dense/bias': [1024], 'cls/predictions/transform/LayerNorm/beta': [1024], 'cls/predictions/transform/LayerNorm/gamma': [1024], 'cls/predictions/output_bias': [21128], 'cls/seq_relationship/output_weights': [2, 1024], 'cls/seq_relationship/output_bias': [2] }
而google的albert_large_zh模型参数: {'bert/embeddings/word_embeddings': [21128, 128], 'bert/embeddings/token_type_embeddings': [2, 128], 'bert/embeddings/position_embeddings': [512, 128], 'bert/embeddings/LayerNorm/beta': [128], 'bert/embeddings/LayerNorm/gamma': [128], 'bert/encoder/embedding_hidden_mapping_in/kernel': [128, 1024], 'bert/encoder/embedding_hidden_mapping_in/bias': [1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/kernel': [1024, 1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/self/query/bias': [1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/kernel': [1024, 1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/self/key/bias': [1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/kernel': [1024, 1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/self/value/bias': [1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/kernel': [1024, 1024], 'bert/encoder/transformer/group_0/inner_group_0/attention_1/output/dense/bias': [1024], 'bert/encoder/transformer/group_0/inner_group_0/LayerNorm/beta': [1024], 'bert/encoder/transformer/group_0/inner_group_0/LayerNorm/gamma': [1024], 'bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/kernel': [1024, 4096], 'bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/dense/bias': [4096], 'bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/kernel': [4096, 1024], 'bert/encoder/transformer/group_0/inner_group_0/ffn_1/intermediate/output/dense/bias': [1024], 'bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/beta': [1024], 'bert/encoder/transformer/group_0/inner_group_0/LayerNorm_1/gamma': [1024], 'bert/pooler/dense/kernel': [1024, 1024], 'bert/pooler/dense/bias': [1024], 'cls/predictions/transform/dense/kernel': [1024, 128], 'cls/predictions/transform/dense/bias': [128], 'cls/predictions/transform/LayerNorm/beta': [128], 'cls/predictions/transform/LayerNorm/gamma': [128], 'cls/predictions/output_bias': [21128], 'cls/seq_relationship/output_weights': [2, 1024], 'cls/seq_relationship/output_bias': [2], 'global_step': []}
两者将word embedding从128维扩展到1024维的处理位置不同,layer normalization的位置也不同。另外很多变量名也不一样。
因此,用google的albert代码加载albert_zh的模型是不行的。
遇到了同样的问题,有解决方案吗
ValueError: Shape of variable bert/embeddings/word_embeddings:0 ((21128, 312)) doesn't match with shape of tensor bert/embeddings/word_embeddings ([21128, 128]) from checkpoint reader.
我的解决方法是: from bert.modeling import get_assignment_map_from_checkpoint,BertConfig, BertModel 改为 from modeling import get_assignment_map_from_checkpoint,BertConfig, BertModel
还要有tf.train.init_from_checkpoint()和sess.run(tf.global_variables_initializer())
ValueError: Shape of variable bert/embeddings/word_embeddings:0 ((21128, 312)) doesn't match with shape of tensor bert/embeddings/word_embeddings ([21128, 128]) from checkpoint reader.
我的解决方法是: from bert.modeling import get_assignment_map_from_checkpoint,BertConfig, BertModel 改为 from modeling import get_assignment_map_from_checkpoint,BertConfig, BertModel
还要有tf.train.init_from_checkpoint()和sess.run(tf.global_variables_initializer())
”还要有tf.train.init_from_checkpoint()和sess.run(tf.global_variables_initializer())“请问这个在哪里改呀