epic icon indicating copy to clipboard operation
epic copied to clipboard

Remove while space only sentences in NewLineSentenceSegmenter

Open hiroshinoji opened this issue 7 years ago • 0 comments

NewLineSentenceSegmenter did not trim each segmented sentence, so for example, it always outputted an error:

$ echo I live in Osaka . | java -Xmx4g -cp assembly.jar epic.parser.ParseText --model parsers/SpanModel-300.parser --sentences newline --tokens whitespace
(TOP (S (NP (PRP He) ) (VP (VBZ lives)  (PP (IN in)  (NP (NNP Osaka) )))))
### Could not tag Vector(), because No parse for Vector(): infinite partition... epic.parser.projections.ChartProjector$class.project(ChartProjector.scala:36);epic.parser.projections.AnchoredRuleMarginalProjector.project(EnumeratedAnchoring.scala:78)

I added an filter for empty sentences as in MLSentenceSegmenter, which avoids this by trimming every sentence. Now no error is outputted.

hiroshinoji avatar Mar 01 '17 02:03 hiroshinoji