Richard Eckart de Castilho
Richard Eckart de Castilho
@munterkalmsteiner I wonder if you somehow forgot about this? ;)
Which tokenizer? In brat or in DKPro Core?
Seems to be related to this issue in brat: "Allow any character in textbound annotation span" https://github.com/nlplab/brat/issues/819 But in our case, it is strange that the line break crops up...
AFAIK windows is the only remaining system with two-character linebreak indicators. Linux and OS X both use `\n` these days.
Lol :) Unfortunately, our Windows-based build is currently failing before it reaches that module.
There is no tokenizer involved in `BratReaderWriterTest.testConll2009_2`. Actually there is not even a Conll2009 reader involved. If I remember correctly, then `testConll2009_2` basically takes the brat file produced in `testConll2009`,...
Getting a Windows-compatible build is surely the goal. After all, we finally set up a Windows-based build slave for that :)
Does this test still fail for you?
``` I think this is rather an issue with the BerkeleyParser implementation, not with DKPro Core. MaltParser chokes because it finds no pos-tags. I'd recommend using a separate pos tagger...
``` The DKPro Core BerkeleyParser component internally uses the CoarseToFineMaxRuleParser form the Berkeley package. The package appears to include other parsers as well: CoarseToFineMaxRuleDerivationParser CoarseToFineMaxRuleProductParser CoarseToFineNBestParser CoarseToFineTwoChartsParser ConstrainedTwoChartsParser ConstrainedHierarchicalTwoChartParser I...