xlnet Is xlnet indeed context aware?

Hi All

I've been playing with Spacy and BERT and I'm trying to see how the embedding of each word varies across different context.

For example, for the following three sentences:

nlp = spacy.load("en_pytt_bertbaseuncased_lg") apple1 = nlp("Apple shares rose on the news.") apple2 = nlp("Apple sold fewer iPhones this quarter.") apple3 = nlp("Apple pie is delicious.")

print(apple1[0].similarity(apple2[0])) # 0.73428553 print(apple1[0].similarity(apple3[0])) # 0.43365782

0.7342856 0.43365765

As one would expect. So far so good. However, if I do the same w/

nlp_xlnet = spacy.load("en_pytt_xlnetbasecased_lg") apple1 = nlp_xlnet("Apple shares rose on the news.") apple2 = nlp_xlnet("Apple sold fewer iPhones this quarter.") apple3 = nlp_xlnet("Apple pie is delicious.") print(apple1[0].similarity(apple2[0])) # 0.73428553 print(apple1[0].similarity(apple3[0])) # 0.43365782

0.9853272 0.9792127

It means that xlnet (at least in this example) is completely unaware of the context. Given xlnet's stellar GLUE and Squad2 results, I was really surprised by this finding. Granted, it's only a super trivial example, but still, it causes me to pause and scratch my head.

Anyone else has experienced similar results? Or maybe I've done something wrong or simply missed how the whole thing was supposed to work?

Thank you for your input. SH

Aug 28 '19 23:08 studiocardo

FYI I tried several ways to construct a sentence embedding given text input and hidden outputs. They all turned out to be surprisingly similar in cosine similarity(just like the result you get), while if the same thing was done to bert, the embeddings show desirable similarity and dissimilarity. I was thinking it may just be the result of the absence of sentence level pretraining tasks, but well, seeing your result makes me wonder even more.

Aug 30 '19 08:08 illuminascent

What happens if you use the cased model of BERT 🤔

Aug 30 '19 08:08 stefan-it

I am aware of the casing discrepancy. However, I can only use what came w/ Spacy… :(

I should have tried more examples with uncase words… I’ll do that and report the results.

SH

On Aug 30, 2019, at 1:19 AM, Stefan Schweter [email protected] wrote:

What happens if you use the cased model of BERT 🤔

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zihangdai/xlnet/issues/222?email_source=notifications&email_token=ABTZAPCIBHQKMWR7WXGPBODQHDJZBA5CNFSM4IRXFJXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5Q6BBI#issuecomment-526508165, or mute the thread https://github.com/notifications/unsubscribe-auth/ABTZAPE5FKPCKSY6FMGD4X3QHDJZBANCNFSM4IRXFJXA.

Aug 31 '19 03:08 studiocardo

I have observed a similar issue when it comes to context for word embeddings which can explain why it might behave the same on sentence level.

In ELMO, BERT and ALBERT they all aware of the context: “Bank river.” “Bank rober.”

The word Bank has different embeddings vectors since the context is different, unfortunately, in XLNet the Bank has the same embeddings.

https://github.com/zihangdai/xlnet/issues/264

Apr 29 '20 18:04 maziyarpanahi

Did anyone figure this out? I am still experiencing the same issue with no solution: https://github.com/zihangdai/xlnet/issues/264

Aug 01 '20 16:08 maziyarpanahi

xlnet xlnet copied to clipboard

Is xlnet indeed context aware?

xlnet
xlnet copied to clipboard