wangwei7175878
wangwei7175878
Hi there, I trained the model on a big dataset (wiki 2500M + bookscorpus 800M, same as the BERT paper) for 200000 steps and achieve an accuracy of 91%. I...
@codertimo The model can't converge use weight_decay = 0.01. My dataset is not exactly the origin corpus, but I think it is almost the same. Wiki data can easily download...
@briandw My pre-trained model failed on downstream tasks(Fine-tune model can't converge). I will share the pre-trained model once it works.
@codertimo Here is the whole log. It took me almost one week to train about 250000 steps. The accuracy seems to be stuck at 91% which is reported as 98%...
Hi there, I believe I fixed why model can’t converge with weight_decay = 0.01. Follow openai’s code [here:](https://github.com/openai/finetune-transformer-lm/blob/master/opt.py) I think BERT used adamW instead of adam. With rewriting this adam...
Could you please be specific about what you mean by teacher model. It's not clear enough for me.
对应的StructBERT 2.0的模型,目前还没有开源到本项目
@taolei87 `cat /usr/local/cuda/version.txt CUDA Version 8.0.61` ` torch.cuda.is_available() True`