sentiment icon indicating copy to clipboard operation
sentiment copied to clipboard

Add Twitter validation dataset

Open thisandagain opened this issue 8 years ago • 3 comments

We currently validate against a dataset from UCI that includes Amazon, Yelp, and IMDB. This is great but it would be nice to have less formal texts (particularly those that include emoji) included in validation. Various NLP areas are well explored using Twitter as a corpus so I don't think this should be too difficult to track down, but will require some research.

thisandagain avatar Mar 22 '17 14:03 thisandagain

Related to #24

thisandagain avatar Mar 22 '17 14:03 thisandagain

On this subject, this link might help https://finnaarupnielsen.wordpress.com/2011/03/16/afinn-a-new-word-list-for-sentiment-analysis/

Edit: Apologies - I misunderstood the issue. I see you already use that and this issue is purely for validation.

dparlevliet avatar Oct 26 '17 11:10 dparlevliet

I think this is what you are looking for: https://old.datahub.io/dataset/twitter-sentiment-analysis It also may be interesting examining how effective this works against longer texts, one example is the Cornell Movie Review Dataset

pdw207 avatar Jun 20 '18 02:06 pdw207