rcnn-text-classification
rcnn-text-classification copied to clipboard
Tensorflow Implementation of "Recurrent Convolutional Neural Network for Text Classification" (AAAI 2015)
Recurrent Convolutional Neural Network for Text Classification
Tensorflow implementation of "Recurrent Convolutional Neural Network for Text Classification".
Data: Movie Review
- Movie reviews with one sentence per review. Classification involves detecting positive/negative reviews (Pang and Lee, 2005).
- Download "sentence polarity dataset v1.0" at the <U>Official Download Page</U>.
- Located in <U>"data/rt-polaritydata/"</U> in my repository.
- rt-polarity.pos contains 5331 positive snippets.
- rt-polarity.neg contains 5331 negative snippets.
Implementation of Recurrent Structure
- Bidirectional RNN (Bi-RNN) is used to implement the left and right context vectors.
- Each context vector is created by shifting the output of Bi-RNN and concatenating a zero state indicating the start of the context.
Usage
Train
-
positive data is located in <U>"data/rt-polaritydata/rt-polarity.pos"</U>.
-
negative data is located in <U>"data/rt-polaritydata/rt-polarity.neg"</U>.
-
"GoogleNews-vectors-negative300" is used as pre-trained word2vec model.
-
Display help message:
python train.py --help -
Train Example:
python train.py --cell_type "lstm" \ -pos_dir "data/rt-polaritydata/rt-polarity.pos" \ -neg_dir "data/rt-polaritydata/rt-polarity.neg"\ -word2vec "GoogleNews-vectors-negative300.bin"
Evalutation
-
Movie Review dataset has no test data.
-
If you want to evaluate, you should make test dataset from train data or do cross validation. However, cross validation is not implemented in my project.
-
The bellow example just use full rt-polarity dataset same the train dataset.
-
Evaluation Example:
python eval.py \ -pos_dir "data/rt-polaritydata/rt-polarity.pos" \ -neg_dir "data/rt-polaritydata/rt-polarity.neg" \ -checkpoint_dir "runs/1523902663/checkpoints"
Result
- Comparision between Recurrent Convolutional Neural Network and Convolutional Neural Network.
- dennybritz's cnn-text-classification-tf is used for compared CNN model.
- Same pre-trained word2vec used for both models.
Accuracy for validation set
Loss for validation set
Reference
- Recurrent Convolutional Neural Network for Text Classification (AAAI 2015), S Lai et al. [paper]