NeuralSum
NeuralSum copied to clipboard
Code for labeling the sentences
Could you share your rule-based system used to label the sentence, I don't find it in this repo. I want to apply it on some extra data. Thanks in advance.
@cheng6076, bumping this comment. thanks!
Interested as well! In the current dataset, the sentence splits are partly broken -- whenever a sentence does not end in a punctuation mark, it is merged to the following one. The original dataset at https://cs.nyu.edu/~kcho/DMQA/ has correct sentence splits (indicated by double newlines), it would be great to re-run the labelling on there.
I would be interested in re-running the labeling on a non-anonymized version of the dataset: https://github.com/abisee/cnn-dailymail
Actually I'm just a little bit confused about the training of the labeling model. According to the paper “Neural summarization by extracting sentences and words”, the model uses a rule-based system adjusted on “9,000 documents with manual sentence labels created by Woodsend and Lapata(2010)”. I check the origin data(http://homepages.inf.ed.ac.uk/mlap/resources/cnnhlights/), but seems there’s just 216 labeled training data. If the original labeled data is available then we can tune the systems on our own. I have e-mailed cheng but haven't heard back yet.