berkeley-doc-summarizer
berkeley-doc-summarizer copied to clipboard
Alex
The joint model (COREF+NER+WIKI) of the Berkeley Entity Resolution System combines the output for all input documents (e.g. government.txt and music.txt) into a single file output.conll. While the output produced by other models does not exactly match the test files in the Berkeley Document Summarizer (e.g. the last two columns of government.txt are off). Would appreciate a clarification on the assumed data interface between the Berkeley Entity Resolution System and the Berkeley Document Summarizer.
Greg clarified that the utility class edu.berkeley.nlp.entity.preprocess.ConllDocSharder can be used for this splitting