pytorch_bert_japanese
pytorch_bert_japanese copied to clipboard
Adding usage for transformers
Hi, thank you for creating an excellent repository!
I found that pytorch-pretrained_bert
was replaced with transformers
.
Using transformer
requires a modification on bert_juman.py
.
The modification is adding an argument output_hidden_states=True
to get hidden states in the forward function of BERT.
You can see the detail description about output_hidden_states=True
in the link as below,
https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_bert.py#L688
I tried to describe the modification as this pull request. I hope it will be helpful for you.
I attach my results as below,
In [5]: from bert_juman_with_transformers import BertWithJumanModel
In [6]: bert = BertWithJumanModel("../../MODELS/bert/Japanese_L-12_H-768_A-12_E-30_BPE_transformers/")
In [7]: bert.get_sentence_embedding("吾輩は猫である。").shape
Out[7]: (768,)
In [8]: bert.get_sentence_embedding("吾輩は猫である。")
Out[8]:
array([-4.25627619e-01, -3.42006892e-01, -7.15176389e-02, -1.09820056e+00,
1.08186698e+00, -2.35575914e-01, -1.89862773e-01, -5.50958455e-01,
1.87978148e-01, -9.03697014e-01, -2.67813027e-01, -1.49959311e-01,
5.91513515e-01, -3.52201462e-01, 1.84209332e-01, 4.01529483e-02,
1.53244898e-01, -6.31160438e-01, -2.07539946e-01, -1.49968192e-01,
-3.31581414e-01, 4.01663631e-01, 3.73950928e-01, -4.13331598e-01,
Note that the embedding is different because I used an alternate model for transformers from Kurohashi-lab. (See "(更新: 19/11/15) ")
Thanks,