BERT-pytorch Default model sizes are much smaller than BERT base

Default model sizes are much smaller than BERT base

Open bertmaher opened this issue 3 years ago • 0 comments

The base BERT model in https://arxiv.org/pdf/1810.04805.pdf uses 768 hidden features, 12 layers, 12 heads (which are also the defaults in bert.py), while the default configuration in the argparser of __main__.py uses 256/8/8. Would it make sense to align the example script with the paper? I spent quite a while puzzling over my low GPU utilization with the default configuration. Thanks!

Aug 28 '20 04:08 bertmaher

BERT-pytorch BERT-pytorch copied to clipboard

Default model sizes are much smaller than BERT base

BERT-pytorch
BERT-pytorch copied to clipboard