berkeley-doc-summarizer
berkeley-doc-summarizer copied to clipboard
Preparing the dataset
Your instructions mention:
To prepare the dataset, first you need to extract all the XML files from 2003-2007 and flatten them into a single directory
Is 2003-2007 referring to train_corefner_standoff or train_abstracts_standoff?
Within each of these directories, the files contained don't seem to have an XML format.
Not sure how to do the aforementioned step...
Oh I see, it looks like you're referring to a NYT dataset that costs $300 for those who aren't members of the Linguistic Data Consortium. Yikes. My university canceled its membership unfortunately :(
EDIT: Whoops, just found the part that says:
The system is distributed with several pre-trained variants
As far as I can tell, I should be able to run one of the pre-trained summarizers without the NYT data set.