bert icon indicating copy to clipboard operation
bert copied to clipboard

Sentiment analysis on emoji data.

Open PrashanthAadepu opened this issue 6 years ago • 10 comments

I am using BERT to do sentiment classification. I am currently classifying into positive, negative and neutral.

I have some data with emojis and it is always classifying them as neutral. I think I am missing something here.

Could someone explain to me how to deal with emoji data to classify them correctly.

Thanks in advance.

PrashanthAadepu avatar Jul 04 '19 10:07 PrashanthAadepu

Hello, Could you elaborate on what you mean by "data with emojis". Do you mean the emoji alone or with some surrounding text? Because as far as I remember the author of this repo has added emoji support. Thanks

aditya-malte avatar Jul 04 '19 12:07 aditya-malte

Hi,

I have data like the below sentences.

😍 Love your service period! 😂😂😂😉🤗💕

When I classify the sentences with only emojis its always predicting them as neutral.

Thanks.

PrashanthAadepu avatar Jul 04 '19 12:07 PrashanthAadepu

Interesting. Does your training data consists of a mixture of emoji and emoji less text? Or do all of them have emojis?

aditya-malte avatar Jul 04 '19 12:07 aditya-malte

Data has below variants 1, Sentence with no emoji. Ex: Very useful for customers 2, Sentence with text and emoji. Ex: 😍 Love your service period! 3, Sentence with the only emoji. Ex: 😂😂😂😉🤗💕

Thanks.

PrashanthAadepu avatar Jul 04 '19 13:07 PrashanthAadepu

That's very surprising. Could you share the hyperparameters that you have used so that I can see if something is wrong.

aditya-malte avatar Jul 04 '19 13:07 aditya-malte

Hello

I am using the below notebook. Tweaked it to classify neutral sentiments also.

https://github.com/google-research/bert/blob/master/predicting_movie_reviews_with_bert_on_tf_hub.ipynb

I am using below tokenizer to properly tokenize the emojis. https://github.com/google-research/bert/blob/master/tokenization.py

And I also added some emojis in vocab.txt and passing it to model training.

Thanks.

PrashanthAadepu avatar Jul 04 '19 13:07 PrashanthAadepu

Hey! Any improvements on that aspect? Seems surprising since emojis should be taken into account now by Bert tokenizer. Older version was considering an emoji as UNK token.

dataislife avatar Oct 21 '19 17:10 dataislife

Hello

I am using the below notebook. Tweaked it to classify neutral sentiments also.

https://github.com/google-research/bert/blob/master/predicting_movie_reviews_with_bert_on_tf_hub.ipynb

I am using below tokenizer to properly tokenize the emojis. https://github.com/google-research/bert/blob/master/tokenization.py

And I also added some emojis in vocab.txt and passing it to model training.

Thanks.

Hi Prashanth,

Could you share your code tweaks to classify neutral sentiments? I am starting with the same notebook and am actively struggling with making the same tweaks.

Thank you.

bobbyinfj avatar Nov 14 '19 05:11 bobbyinfj

Hi, I am also trying to use BERT on data containing emojis but they are always encoded as <UNK> from the Tokenizer. Has there been any progress in making emojis correctly processed?

freeIsa avatar Jun 23 '20 09:06 freeIsa

Hello.. could you please share the dataset with emojis.. that would be more helpful..

Vithurshana avatar Jun 21 '23 09:06 Vithurshana