xlnet icon indicating copy to clipboard operation
xlnet copied to clipboard

Is xlnet indeed context aware?

Open studiocardo opened this issue 5 years ago • 5 comments

Hi All

I've been playing with Spacy and BERT and I'm trying to see how the embedding of each word varies across different context.

For example, for the following three sentences:

nlp = spacy.load("en_pytt_bertbaseuncased_lg") apple1 = nlp("Apple shares rose on the news.") apple2 = nlp("Apple sold fewer iPhones this quarter.") apple3 = nlp("Apple pie is delicious.")

print(apple1[0].similarity(apple2[0])) # 0.73428553 print(apple1[0].similarity(apple3[0])) # 0.43365782

0.7342856 0.43365765

As one would expect. So far so good. However, if I do the same w/

nlp_xlnet = spacy.load("en_pytt_xlnetbasecased_lg") apple1 = nlp_xlnet("Apple shares rose on the news.") apple2 = nlp_xlnet("Apple sold fewer iPhones this quarter.") apple3 = nlp_xlnet("Apple pie is delicious.") print(apple1[0].similarity(apple2[0])) # 0.73428553 print(apple1[0].similarity(apple3[0])) # 0.43365782

0.9853272 0.9792127

It means that xlnet (at least in this example) is completely unaware of the context. Given xlnet's stellar GLUE and Squad2 results, I was really surprised by this finding. Granted, it's only a super trivial example, but still, it causes me to pause and scratch my head.

Anyone else has experienced similar results? Or maybe I've done something wrong or simply missed how the whole thing was supposed to work?

Thank you for your input. SH

studiocardo avatar Aug 28 '19 23:08 studiocardo

FYI I tried several ways to construct a sentence embedding given text input and hidden outputs. They all turned out to be surprisingly similar in cosine similarity(just like the result you get), while if the same thing was done to bert, the embeddings show desirable similarity and dissimilarity. I was thinking it may just be the result of the absence of sentence level pretraining tasks, but well, seeing your result makes me wonder even more.

illuminascent avatar Aug 30 '19 08:08 illuminascent

What happens if you use the cased model of BERT 🤔

stefan-it avatar Aug 30 '19 08:08 stefan-it

I am aware of the casing discrepancy. However, I can only use what came w/ Spacy… :(

I should have tried more examples with uncase words… I’ll do that and report the results.

SH

On Aug 30, 2019, at 1:19 AM, Stefan Schweter [email protected] wrote:

What happens if you use the cased model of BERT 🤔

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zihangdai/xlnet/issues/222?email_source=notifications&email_token=ABTZAPCIBHQKMWR7WXGPBODQHDJZBA5CNFSM4IRXFJXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5Q6BBI#issuecomment-526508165, or mute the thread https://github.com/notifications/unsubscribe-auth/ABTZAPE5FKPCKSY6FMGD4X3QHDJZBANCNFSM4IRXFJXA.

studiocardo avatar Aug 31 '19 03:08 studiocardo

I have observed a similar issue when it comes to context for word embeddings which can explain why it might behave the same on sentence level.

In ELMO, BERT and ALBERT they all aware of the context: “Bank river.” “Bank rober.”

The word Bank has different embeddings vectors since the context is different, unfortunately, in XLNet the Bank has the same embeddings.

https://github.com/zihangdai/xlnet/issues/264

maziyarpanahi avatar Apr 29 '20 18:04 maziyarpanahi

Did anyone figure this out? I am still experiencing the same issue with no solution: https://github.com/zihangdai/xlnet/issues/264

maziyarpanahi avatar Aug 01 '20 16:08 maziyarpanahi