hierarchical-attention-networks Performance on the paper's dataset

The performance reported in the Readme has not been computed on the same dataset used in the original paper (Hierarchical Attention Networks for Document Classification, Tang et al. 2016).

It would be more convenient for understanding the real performance of the implementation to report the accuracy on that dataset, where training, dev and test sets are predefined. The dataset can be downloaded from the Duyu Tang's homepage. Download link: http://ir.hit.edu.cn/~dytang/paper/emnlp2015/emnlp-2015-data.7z

Sep 27 '18 09:09 gabrer

@gabrer Hi, do you have re-implemented on dataset of original paper? In my implementation, I can't get the same performance according to the paper.

Oct 21 '18 12:10 superzhangxing

@superzhangxing Hi, in my implementation, I can only get 63.7% accuracy rate on the dev dataset of yelp2013 under the same configuration according to the paper. The accuracy rate on the test dataset is supposed to be a bit lower, which is different from the performance in the paper. Is there some idea? Can those tricks reduce the gap？

Dec 10 '18 08:12 MH23333

@MH23333 Hi, do you use pre-trained word embeddings or random initialized ones? I train them with word2vec method on train and dev dataset. I believe it will improve the accurate. I have the same configuration according to the paper and get around 67% accurate on dev and test dataset. Such as optimizer with SGD+Momentum, and momentum parameter with 0.9. The only trick I think is aligning the sentence length in each batch to accelerate the training speed. and it also has been mentioned in the paper.

Dec 10 '18 09:12 superzhangxing

@superzhangxing Appreciate for your quickly reply！I use word embeddings in the way same as you, and set all hyper parameters mentioned in the paper. Maybe other unmentioned hyper parameters have an important influence on the results. How to set the following two parameters may be important: sentence length(how many words in a sentnece) and document length(how many sentences in a document). I will do more experiments. Many thanks!

Dec 10 '18 13:12 MH23333

@MH23333 The max sentence length and max document length are both set 40. Please note that I use dynamic rnn , so I don't use fixed sentence length or fixed document length. I'm not sure whether it makes the influence.

Dec 10 '18 15:12 superzhangxing

@superzhangxing Thanks a lot! I also use dynamic rnn and masked attention. More contrast tests will be performed. Hope I can get better results.

Dec 11 '18 02:12 MH23333

hierarchical-attention-networks hierarchical-attention-networks copied to clipboard

Performance on the paper's dataset

hierarchical-attention-networks
hierarchical-attention-networks copied to clipboard