metanlp
metanlp copied to clipboard
How is the softmax classifier initialized in the Bert-Base model?
How is the softmax classifier initialized in the Bert-Base model?
Is zero initialized?