Danny-Google comments

Results 10 comments of


                                            Danny-Google

Training from scratch on TPU

If you turn off dropout, you may be able to use a larger batch size.

model isn't learning

Could you post more info like training info? Also, you may want to start with our colab tutorial.

Ran out of memory in memory space hbm on RACE xlarge v3 on TPU v2-8

We haven't tried it on TPU-v2 version, but how about you try it without dropout? We found that remove dropout can significantly reduce memory consumption.

SentencePiece Models for Chinese Models Missing?

For Chinese models, we use word piece model provided by Jacob as sentence piece get worse performance on reading comprehension tasks for Chinese.

SentencePiece Models for Chinese Models Missing?

The vocab file is in the same folder with the model. For word piece, you only need the vocab file, not the model. You can skip the model part for...

SentencePiece Models for Chinese Models Missing?

@beamind Currently, Squad_utils is meant to be used only for squad dataset. If you use Chinese models, you may want to take a look at the clue code (https://github.com/CLUEbenchmark/CLUE/tree/master/baselines/models/albert). @008karan...

do u have some contrastive datas between chinese Albert with other chinese pretraining models?

Yes, you can find the comparison in the Chinese CLUE page (https://github.com/CLUEbenchmark/CLUE). Maybe it is because the way I trained it. The xxlarge model is sensitive to the downstream hyperparamters....

do u have some contrastive datas between chinese Albert with other chinese pretraining models?

The xxlarge model of the first and second tables are trained by me. The xxlarge model is not very stable as there were some problems in training the xxlarge model....

Danny-Google

Multilingual Albert

Multilingual Albert

Training from scratch on TPU

model isn't learning

Ran out of memory in memory space hbm on RACE xlarge v3 on TPU v2-8

SentencePiece Models for Chinese Models Missing?

SentencePiece Models for Chinese Models Missing?

SentencePiece Models for Chinese Models Missing?

do u have some contrastive datas between chinese Albert with other chinese pretraining models?

do u have some contrastive datas between chinese Albert with other chinese pretraining models?