BERT-keras
BERT-keras copied to clipboard
number of trainable parameters
I don't quite understand one point. When I downloaded your keras representation of BERT and check the number of trainable parameters in summary, it showed ~177 mil parameters, while in official bert it should be 110 mil for base model. Could you explain where this difference comes from?
Hi, I'm not entirely sure, but maybe it's because of the subword embeddings? most of the time people don't count input embeddings in their model parameters.