Danny-Google
Danny-Google
Yes, we do. Please stay tuned
Thanks, we are working on it now
If you turn off dropout, you may be able to use a larger batch size.
Could you post more info like training info? Also, you may want to start with our colab tutorial.
We haven't tried it on TPU-v2 version, but how about you try it without dropout? We found that remove dropout can significantly reduce memory consumption.
For Chinese models, we use word piece model provided by Jacob as sentence piece get worse performance on reading comprehension tasks for Chinese.
The vocab file is in the same folder with the model. For word piece, you only need the vocab file, not the model. You can skip the model part for...
@beamind Currently, Squad_utils is meant to be used only for squad dataset. If you use Chinese models, you may want to take a look at the clue code (https://github.com/CLUEbenchmark/CLUE/tree/master/baselines/models/albert). @008karan...
Yes, you can find the comparison in the Chinese CLUE page (https://github.com/CLUEbenchmark/CLUE). Maybe it is because the way I trained it. The xxlarge model is sensitive to the downstream hyperparamters....
The xxlarge model of the first and second tables are trained by me. The xxlarge model is not very stable as there were some problems in training the xxlarge model....