berkeley-doc-summarizer icon indicating copy to clipboard operation
berkeley-doc-summarizer copied to clipboard

Issue with formats used

Open h22roscoe opened this issue 7 years ago • 3 comments

Hi @gregdurrett

I am currently using the Entity Preprocessing Driver main method to turn my regular .txt files into the (Conll?) format understood by this summarizer however I am getting issues at the moment with the ConllReader class used in the Summarizer class unable to parse some of the generated lines (in the assembleConstTree method because some lines appear to be missing a "*")

Would you be able to shed more light on the Conll format that the summarizer is expecting?

Thanks, Harry

h22roscoe avatar Jun 09 '17 14:06 h22roscoe

Ok, I have resolved the issue here by making sure the docName has no whitespace characters but now I get a warning that there are no gold mentions on the document.

h22roscoe avatar Jun 09 '17 16:06 h22roscoe

Hi Harry,

Glad you figured out the first issue -- guess that should be documented...

The no gold mentions warning is normal -- basically this isn't gold coreference data so we don't expect to have gold labels. (I cribbed a bunch of code from the berkeley-entity system, which was originally a berkeley coref system that did expect gold coref information everywhere.)

Greg

gregdurrett avatar Jun 09 '17 22:06 gregdurrett

Thanks Greg

h22roscoe avatar Jun 13 '17 12:06 h22roscoe