Richard Eckart de Castilho
Richard Eckart de Castilho
We currently have no Mecab binaries for Windows.
HunPosTagger hangs on Windows waiting for output from the tagger binary.
Running a reader with a configuration like this: ``` CollectionReaderDescription reader = createReaderDescription( TextReader.class, TextReader.PARAM_PATTERNS, new String[] { "src/test/resources/texts/test1.txt", "src/test/resources/texts/test2.txt" }); ``` Produces filenames like these: ``` file%3A%2Fhome%2Fsomeuser%2Fgit%2Fdkpro-core%2Fdkpro-core-io-text-asl%2Fsrc%2Ftest%2Fresources%2Ftexts%2Ftest1.txt file%3A%2Fhome%2Fsomeuser%2Fgit%2Fdkpro-core%2Fdkpro-core-io-text-asl%2Fsrc%2Ftest%2Fresources%2Ftexts%2Ftest2.txt ```...
Source: https://github.com/dkpro/dkpro-core/issues/619#issuecomment-280954963 Having in mind that your Croatian example is bad Croatian in the first place, the correct sentence would be something like this: > moramo odraditi vrlo kompliciran primjer...
MateParser and MatePosTagger module records internal tags in tagset. Those tags never get actually produced by the parser. They should not be recorded.
The English models for MaltParser use an input POS tag `PRT` which does not exist as a POS tag in the [Penn Treebank Tagset](http://www.clips.ua.ac.be/pages/mbsp-tags). `PRT` is actually a chunk tag...
LanguageToolSegmenter chokes on "丁肇中": ``` Caused by: java.lang.IllegalStateException: Token [丁中] not found in sentence [丁肇中] at de.tudarmstadt.ukp.dkpro.core.languagetool.LanguageToolSegmenter.process(LanguageToolSegmenter.java:90) ```
Some problem here: ``` Caused by: java.lang.NullPointerException at de.tudarmstadt.ukp.dkpro.core.stanfordnlp.util.TreeWithTokens.setTree(TreeWithTokens.java:54) at de.tudarmstadt.ukp.dkpro.core.stanfordnlp.util.TreeWithTokens.(TreeWithTokens.java:48) at de.tudarmstadt.ukp.dkpro.core.stanfordnlp.StanfordParser.process(StanfordParser.java:407) at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:385) ... 13 more ``` Apparently the tree object returned by the parser can...
Since Stanford CoreNLP parser now supports dependency conversion using either the original Stanford Dependencies or the Universal Dependencies, we must check if the dependency tagset recording for English still properly...
The binaries for mecab are packages as models, but they should be packages as binaries and have a corresponding artifactId etc. Cf. other packages that use native binaries, e.g. TreeTagger,...