NeuralSum icon indicating copy to clipboard operation
NeuralSum copied to clipboard

Code for labeling the sentences

Open xiang-deng opened this issue 6 years ago • 4 comments

Could you share your rule-based system used to label the sentence, I don't find it in this repo. I want to apply it on some extra data. Thanks in advance.

xiang-deng avatar Apr 12 '18 12:04 xiang-deng

@cheng6076, bumping this comment. thanks!

AlJohri avatar Apr 12 '18 23:04 AlJohri

Interested as well! In the current dataset, the sentence splits are partly broken -- whenever a sentence does not end in a punctuation mark, it is merged to the following one. The original dataset at https://cs.nyu.edu/~kcho/DMQA/ has correct sentence splits (indicated by double newlines), it would be great to re-run the labelling on there.

f0k avatar Apr 24 '18 15:04 f0k

I would be interested in re-running the labeling on a non-anonymized version of the dataset: https://github.com/abisee/cnn-dailymail

AlJohri avatar Apr 24 '18 17:04 AlJohri

Actually I'm just a little bit confused about the training of the labeling model. According to the paper “Neural summarization by extracting sentences and words”, the model uses a rule-based system adjusted on “9,000 documents with manual sentence labels created by Woodsend and Lapata(2010)”. I check the origin data(http://homepages.inf.ed.ac.uk/mlap/resources/cnnhlights/), but seems there’s just 216 labeled training data. If the original labeled data is available then we can tune the systems on our own. I have e-mailed cheng but haven't heard back yet.

xiang-deng avatar Apr 24 '18 17:04 xiang-deng