Results 4 comments of 艾梦

I guess it should be caused by a small batch size. Have you used the same training batch size and the same epoch as the official? @Colanim large batch size...

I got something about this: [keras-users/EhWwuq6R0lQ](https://groups.google.com/forum/#!topic/keras-users/EhWwuq6R0lQ) I'm not familiar with theano, so I don't know why it's OK on tensorflow but not okay on theano.

Oh, I see it. Maybe the theano support is not very necessary. At least now we rarely use theano. I should have seen it. It seems that I have donesome...

Thanks for your advice. BERT is really so large one for me. I will try your suggestion and wish you success on your new try.