wangwei7175878 comments

Results 8 comments of


                                            wangwei7175878

pred_loss decrease fast while avg_acc stay at 50%

Hi there, I trained the model on a big dataset (wiki 2500M + bookscorpus 800M, same as the BERT paper) for 200000 steps and achieve an accuracy of 91%. I...

pred_loss decrease fast while avg_acc stay at 50%

@codertimo The model can't converge use weight_decay = 0.01. My dataset is not exactly the origin corpus, but I think it is almost the same. Wiki data can easily download...

pred_loss decrease fast while avg_acc stay at 50%

@briandw My pre-trained model failed on downstream tasks(Fine-tune model can't converge). I will share the pre-trained model once it works.

pred_loss decrease fast while avg_acc stay at 50%

@codertimo Here is the whole log. It took me almost one week to train about 250000 steps. The accuracy seems to be stuck at 91% which is reported as 98%...

pred_loss decrease fast while avg_acc stay at 50%

Hi there, I believe I fixed why model can’t converge with weight_decay = 0.01. Follow openai’s code [here:](https://github.com/openai/finetune-transformer-lm/blob/master/opt.py) I think BERT used adamW instead of adam. With rewriting this adam...

teacher model

Could you please be specific about what you mean by teacher model. It's not clear enough for me.

通义模型

对应的StructBERT 2.0的模型，目前还没有开源到本项目

Error when use SRU in DrQA

@taolei87 `cat /usr/local/cuda/version.txt CUDA Version 8.0.61` ` torch.cuda.is_available() True`