pytorch_bert_japanese icon indicating copy to clipboard operation
pytorch_bert_japanese copied to clipboard

Adding usage for transformers

Open SeitaroShinagawa opened this issue 4 years ago • 0 comments

Hi, thank you for creating an excellent repository!

I found that pytorch-pretrained_bert was replaced with transformers.
Using transformer requires a modification on bert_juman.py. The modification is adding an argument output_hidden_states=True to get hidden states in the forward function of BERT.

You can see the detail description about output_hidden_states=True in the link as below,
https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_bert.py#L688

I tried to describe the modification as this pull request. I hope it will be helpful for you.
I attach my results as below,

In [5]: from bert_juman_with_transformers import BertWithJumanModel

In [6]: bert = BertWithJumanModel("../../MODELS/bert/Japanese_L-12_H-768_A-12_E-30_BPE_transformers/")

In [7]: bert.get_sentence_embedding("吾輩は猫である。").shape
Out[7]: (768,)

In [8]: bert.get_sentence_embedding("吾輩は猫である。")
Out[8]:
array([-4.25627619e-01, -3.42006892e-01, -7.15176389e-02, -1.09820056e+00,
        1.08186698e+00, -2.35575914e-01, -1.89862773e-01, -5.50958455e-01,
        1.87978148e-01, -9.03697014e-01, -2.67813027e-01, -1.49959311e-01,
        5.91513515e-01, -3.52201462e-01,  1.84209332e-01,  4.01529483e-02,
        1.53244898e-01, -6.31160438e-01, -2.07539946e-01, -1.49968192e-01,
       -3.31581414e-01,  4.01663631e-01,  3.73950928e-01, -4.13331598e-01,

Note that the embedding is different because I used an alternate model for transformers from Kurohashi-lab. (See "(更新: 19/11/15) ")

Thanks,

SeitaroShinagawa avatar Aug 20 '20 19:08 SeitaroShinagawa