deeplearning4j-docs
deeplearning4j-docs copied to clipboard
Text classification tutorial on Reuters dataset
Due Date
To be completed by: 2018-05-30
Description
Use the Reuters news dataset to create a tutorial for text classification.
Assignees
Please ensure you have assigned at least one person to this issue. Include any authors and reviewers required.
We should put some effort into finding a more relevant use case and data set.
For example, how about sentiment classification?
- opinmind.com: https://www.kaggle.com/c/si650winter11/data
- RottenTomatoes reviews: https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews/data
- Sentiment treebank: https://nlp.stanford.edu/sentiment/
- Movie reviews: http://ai.stanford.edu/~amaas/data/sentiment/
- Multi-domain product reviews: https://www.cs.jhu.edu/~mdredze/datasets/sentiment/
- SemEval2013 Twitter: https://www.cs.york.ac.uk/semeval-2013/task2/
- SemEval2014 Twitter: http://alt.qcri.org/semeval2014/task9/
- SemEval2015 Twitter: http://alt.qcri.org/semeval2015/task10/
- SemEval2016 ecommerce: http://alt.qcri.org/semeval2016/task5/
- Amazon Reviews: http://snap.stanford.edu/data/web-Amazon.html
- Opinion mining: https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
- Yelp open dataset: https://www.yelp.com/dataset
- More movie reviews: http://www.cs.cornell.edu/people/pabo/movie-review-data/
- IMDB and yelp reviews: https://github.com/thunlp/NSC
- thinknook.com/wp-content/uploads/2012/09/Sentiment-Analysis-Dataset.zip
SNAP appears to have a bunch of cool data: http://snap.stanford.edu/data/
There's a few cool things on UCI: https://archive.ics.uci.edu/ml/datasets.html?format=&task=&att=&area=&numAtt=&numIns=&type=text&sort=nameUp&view=table
@turambar I trust your opinion, what do you recommend of all these datasets/use cases? Pick one and let's go with it.
Given the latest rumors about UIPath, I'd say we should find something that is as close as possible to the sort of text classification use cases we'll be working on for them.
I have written a java iterator for the Reuters dataset: https://github.com/AltA-Advisory/ReutersParser
Hopefully makes our lives a little easier.
Thanks @RobAltena: if we proceed with Reuters, we'll check it out!