albert_pytorch
albert_pytorch copied to clipboard
A Lite Bert For Self-Supervised Learning Language Representations

In the paper "Large Batch Optimization for Deep Learning: Training BERT in 76 minutes", the learning rate depends on the batchsize. However, I find that the learning rate is also...
你好: 我使用同样的数据pipeline训练QA模型,使用bert-wwm的时候可以设置batchsize到12,使用albert-xxlarge-v2只能设置batchsize到6。但是albert-xxlarge-v2的模型文件本身只有900M左右而bert-wwm的模型文件有1400M,请问有什么可能的原因造成这种情况吗?
INFO:model.modeling_albert_bright:Initialize PyTorch weight ['bert', 'embeddings', 'position_embeddings'] INFO:model.modeling_albert_bright:Skipping bert/embeddings/position_embeddings/lamb_m Traceback (most recent call last): File "convert_albert_tf_checkpoint_to_pytorch.py", line 59, in args.pytorch_dump_path) File "convert_albert_tf_checkpoint_to_pytorch.py", line 34, in convert_tf_checkpoint_to_pytorch load_tf_weights_in_albert(model, config, tf_checkpoint_path) File "/data/albert_pytorch-master/model/modeling_albert_bright.py",...
您好,我尝试在您训练的albert_small基础之上,使用金融语料预训练albert_small。 碰到问题: 在10万金融语料上训练后,即使再增加数,模型精度也不再提升,损失也不再下降。 当使用原先的学习率(0.000176)会发散,学习率我已经降低到1e-5和1e-6,但是学习效果仍然止步不前。 我训练的albert_small效果如下: 训练精度只有57和68。 1)您是否可以分享一下,albert_small训练的效果? 2)对于提升预训练效果您是否可以分享一些经验?
https://github.com/lonePatient/albert_pytorch/blob/e9dbe3ce9aa49e787774b050cbdc496046e0c5bf/run_classifier.py#L110-L122 以上是run_classifier.py line110-122的代码。假如`args.gradient_accumulation_steps`取默认值1,则不会有任何问题;然而当设置`args.gradient_accumulation_steps`为其他值时,以`4`为例,外循环的前3步(即step=0~3)就无法通过line110的if判断,从而导致global_step一直为0,然后导致line116的if判断基本总能通过(缘由`global_step=0`时,`global_step % args.logging_steps == 0`恒成立),最终导致还没开始梯度更新,就做了3次无谓的`evaluate`。 所以这里可能存在一些瑕疵,我理解的是,这里的变量`global_step`应与line78```logger.info(" Total optimization steps = %d", num_training_steps)```中的num_training_steps保持一致,每进行一次梯度更新,代表实际上一个batch的数据被计算了一遍,`global_step`才+1,这也是`train()`函数最终返回的loss=`tr_loss / global_step`的原因。所以我想是否可以直接在line116、line121的判断上加一个限制`global_step != 0`,我想这样大概就可以暂时解决该问题了。
RuntimeError: Error(s) in loading state_dict for BertModel: size mismatch for bert.embeddings.word_embeddings.weight: copying a param with shape torch.Size([21128, 128]) from checkpoint, the shape in current model is torch.Size([21128, 312]). ======================== Can...
Hi~ I want to utilize _AlbertForPreTraining_ to do MaskedLM task on new datasets(with **new vocab.txt**, whose size is not 21128) **based on the pretraining weights**. How can I do that...
I know bert model can finish CLOTH test, but i want to use albert model to finish CLOTH test. I will appreciate it.
project_layer的参数应该与bert.embeddings.word_embeddings_2的参数对应