deeplearning4j-docs Text classification tutorial on Reuters dataset

Due Date

To be completed by: 2018-05-30

Description

Use the Reuters news dataset to create a tutorial for text classification.

Assignees

Please ensure you have assigned at least one person to this issue. Include any authors and reviewers required.

Apr 18 '18 05:04 crockpotveggies

We should put some effort into finding a more relevant use case and data set.

For example, how about sentiment classification?

opinmind.com: https://www.kaggle.com/c/si650winter11/data
RottenTomatoes reviews: https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews/data
Sentiment treebank: https://nlp.stanford.edu/sentiment/
Movie reviews: http://ai.stanford.edu/~amaas/data/sentiment/
Multi-domain product reviews: https://www.cs.jhu.edu/~mdredze/datasets/sentiment/
SemEval2013 Twitter: https://www.cs.york.ac.uk/semeval-2013/task2/
SemEval2014 Twitter: http://alt.qcri.org/semeval2014/task9/
SemEval2015 Twitter: http://alt.qcri.org/semeval2015/task10/
SemEval2016 ecommerce: http://alt.qcri.org/semeval2016/task5/
Amazon Reviews: http://snap.stanford.edu/data/web-Amazon.html
Opinion mining: https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
Yelp open dataset: https://www.yelp.com/dataset
More movie reviews: http://www.cs.cornell.edu/people/pabo/movie-review-data/
IMDB and yelp reviews: https://github.com/thunlp/NSC
thinknook.com/wp-content/uploads/2012/09/Sentiment-Analysis-Dataset.zip

SNAP appears to have a bunch of cool data: http://snap.stanford.edu/data/

There's a few cool things on UCI: https://archive.ics.uci.edu/ml/datasets.html?format=&task=&att=&area=&numAtt=&numIns=&type=text&sort=nameUp&view=table

Apr 18 '18 21:04 turambar

@turambar I trust your opinion, what do you recommend of all these datasets/use cases? Pick one and let's go with it.

Apr 19 '18 00:04 crockpotveggies

Given the latest rumors about UIPath, I'd say we should find something that is as close as possible to the sort of text classification use cases we'll be working on for them.

Apr 20 '18 03:04 turambar

I have written a java iterator for the Reuters dataset: https://github.com/AltA-Advisory/ReutersParser

Hopefully makes our lives a little easier.

Apr 20 '18 04:04 RobAltena

Thanks @RobAltena: if we proceed with Reuters, we'll check it out!

Apr 20 '18 04:04 turambar

deeplearning4j-docs deeplearning4j-docs copied to clipboard

Text classification tutorial on Reuters dataset

Due Date

Description

Assignees

deeplearning4j-docs
deeplearning4j-docs copied to clipboard