WladimirSidorenko/SemEval-2016: Twitter Sentiment System for SemEval 2016

SemEval-2016 Task 4: Sentiment Analysis on Twitter *

          TRAINING + DEV DATA                  *

http://alt.qcri.org/semeval2016/task4/ *
[email protected] *

TRAINING + DEV dataset for SemEval-2016 Task 4

Version 1.0: October 15, 2015

Task organizers:

Preslav Nakov, Qatar Computing Research Institute, HBKU
Alan Ritter, The Ohio State University
Sara Rosenthal, Columbia University
Fabrizio Sebastiani, Qatar Computing Research Institute, HBKU
Veselin Stoyanov, Facebook

NOTES

Please note that by downloading the Twitter data you agree to abide by the Twitter terms of service (https://twitter.com/tos), and in particular you agree not to redistribute the data and to delete tweets that are marked deleted in the future.
The distribution consists of a set of Twitter status IDs with annotations for Subtasks A, B, C, D, and E: topic polarity and trends toward a topic. There are exactly 100 tweets provided per topic and a total of 100 topics. You should use the downloading script to obtain the corresponding tweets: https://github.com/aritter/twitter_download
The "neutral" label in the annotations stands for objective_OR_neutral.

FILES

data/train/src/100_topics_100_tweets.topic-two-point.subtask-BD.train.txt -- training input for subtasks B and D data/train/src/100_topics_100_tweets.topic-five-point.subtask-CE.train.txt -- training input for subtasks C and E

data/dev/src/100_topics_100_tweets.topic-two-point.subtask-BD.dev.txt -- dev input for subtasks B and D data/dev/src/100_topics_100_tweets.topic-five-point.subtask-CE.dev.txt -- dev input for subtasks C and E

INPUT DATA FORMAT

-----------------------SUBTASK A-----------------------------------------

The format for the training/dev file is as follows:

id<TAB>label

where "label" can be 'positive', 'neutral' or 'negative'.

-----------------------SUBTASKS B,D-------------------------------------- ** Task we might deal with.

The format for the training/dev file is as follows:

topic<TAB>id<TAB>label

where "label" can be 'positive' or 'negative' (note: no 'neutral'!).

-----------------------SUBTASKS C,E--------------------------------------

Task we are dealing with.

The format for the training/dev file is as follows:

topic<TAB>id<TAB>label

where "label" can be -2, -1, 0, 1, or 2, corresponding to "strongly negative", "negative", "negative or neutral", "positive", and "strongly positive".

LICENSE

The accompanying dataset is released under a Creative Commons Attribution 3.0 Unported License (http://creativecommons.org/licenses/by/3.0/).

CITATION

You can cite the folowing paper when referring to the dataset:

@InProceedings{Rosenthal-EtAl:2015:SemEval, author = {Sara Rosenthal and Alan Ritter and Veselin Stoyanov and Svetlana Kiritchenko and Saif Mohammad and Preslav Nakov}, title = {SemEval-2015 Task 10: Sentiment Analysis in Twitter}, booktitle = {Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)}, year = {2015}, publisher = {Association for Computational Linguistics}, }

USEFUL LINKS:

Google group: [email protected] SemEval-2016 Task 4 website: http://alt.qcri.org/semeval2016/task4/ SemEval-2016 website: http://alt.qcri.org/semeval2016/

SemEval-2016
SemEval-2016 copied to clipboard

Metadata

← Metadata

Owner

Metadata

SemEval-2016 SemEval-2016 copied to clipboard

Metadata

← Metadata

Owner

Metadata

SemEval-2016
SemEval-2016 copied to clipboard