deeplearning4j-docs icon indicating copy to clipboard operation
deeplearning4j-docs copied to clipboard

Text classification tutorial on Reuters dataset

Open crockpotveggies opened this issue 6 years ago • 5 comments

Due Date

To be completed by: 2018-05-30

Description

Use the Reuters news dataset to create a tutorial for text classification.

Assignees

Please ensure you have assigned at least one person to this issue. Include any authors and reviewers required.

crockpotveggies avatar Apr 18 '18 05:04 crockpotveggies

We should put some effort into finding a more relevant use case and data set.

For example, how about sentiment classification?

  • opinmind.com: https://www.kaggle.com/c/si650winter11/data
  • RottenTomatoes reviews: https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews/data
  • Sentiment treebank: https://nlp.stanford.edu/sentiment/
  • Movie reviews: http://ai.stanford.edu/~amaas/data/sentiment/
  • Multi-domain product reviews: https://www.cs.jhu.edu/~mdredze/datasets/sentiment/
  • SemEval2013 Twitter: https://www.cs.york.ac.uk/semeval-2013/task2/
  • SemEval2014 Twitter: http://alt.qcri.org/semeval2014/task9/
  • SemEval2015 Twitter: http://alt.qcri.org/semeval2015/task10/
  • SemEval2016 ecommerce: http://alt.qcri.org/semeval2016/task5/
  • Amazon Reviews: http://snap.stanford.edu/data/web-Amazon.html
  • Opinion mining: https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
  • Yelp open dataset: https://www.yelp.com/dataset
  • More movie reviews: http://www.cs.cornell.edu/people/pabo/movie-review-data/
  • IMDB and yelp reviews: https://github.com/thunlp/NSC
  • thinknook.com/wp-content/uploads/2012/09/Sentiment-Analysis-Dataset.zip

SNAP appears to have a bunch of cool data: http://snap.stanford.edu/data/

There's a few cool things on UCI: https://archive.ics.uci.edu/ml/datasets.html?format=&task=&att=&area=&numAtt=&numIns=&type=text&sort=nameUp&view=table

turambar avatar Apr 18 '18 21:04 turambar

@turambar I trust your opinion, what do you recommend of all these datasets/use cases? Pick one and let's go with it.

crockpotveggies avatar Apr 19 '18 00:04 crockpotveggies

Given the latest rumors about UIPath, I'd say we should find something that is as close as possible to the sort of text classification use cases we'll be working on for them.

turambar avatar Apr 20 '18 03:04 turambar

I have written a java iterator for the Reuters dataset: https://github.com/AltA-Advisory/ReutersParser

Hopefully makes our lives a little easier.

RobAltena avatar Apr 20 '18 04:04 RobAltena

Thanks @RobAltena: if we proceed with Reuters, we'll check it out!

turambar avatar Apr 20 '18 04:04 turambar