nlp4j
nlp4j copied to clipboard
EnglishC2DConverter generates ArrayIndexOutOfBoundsException
I am unable to get the EnglishC2DConverter working. The following lines reproduce the problem.
// This is an example from "src/test/resources/constituent/functionTags.parse"
String pennTree = "(TOP (S (NP-SBJ (NP (CC both) (NNP Bush) (CC and) (NNP Rice))) (VP "
+ "(VBP have) (VP (VBN delivered) (NP (NP (NNS speeches)) (, ,) (SBAR (WHNP-1 "
+ "(WDT which)) (S (NP-SBJ (-NONE- *T*-1)) (VP (VBP are) (ADJP-PRD (RB very) "
+ "(JJ clear))))))))))";
CTReader reader = new CTReader();
reader.open(new StringInputStream(pennTree), "UTF-8");
List<CTTree> trees = reader.getTreeList();
System.out.println(trees);
C2DConverter c2d = new EnglishC2DConverter(new HeadRuleMap(getClass()
.getResourceAsStream("/edu/emory/mathcs/nlp/conversion/headrule_en_stanford.txt")));
c2d.toDependencyGraph(trees.get(0));
This code prints an ArrayIndexOutOfBoundsException:
java.lang.ArrayIndexOutOfBoundsException: 12
at edu.emory.mathcs.nlp.conversion.C2DConverter.initDEPTree(C2DConverter.java:195)
at edu.emory.mathcs.nlp.conversion.EnglishC2DConverter.getDEPTree(EnglishC2DConverter.java:1129)
at edu.emory.mathcs.nlp.conversion.EnglishC2DConverter.toDependencyGraph(EnglishC2DConverter.java:136)
Is the converter broken?
As a side note, the converter "swallows" the generated exception in EnglishC2DConverter.toDependencyGraph(CTTree)
- i.e. it prints a stack trace and doesn't forward the exception to the calling code. IMHO it would be better if it did not print the stack trace and instead let the exception bubble up to the calling code.
Richard,
It's because we were in the middle of fixing the code. I temporarily made a separate project to provide the conversion script so please take a look:
http://github.com/emorynlp/ddr
Please let me know if you have more questions. Thanks.
I stumbled across this C2DConverter when I was looking where the semantic and secondary heads in the NLPNode come from - and then I was looking into wrapping that conversion as a new NLP4J UIMA component in DKPro Core. However, if it is known to be broken in the last NLP4J release, then I'd better wait for the next NLP4J release on Maven Central which then probably (?) will also include the new converter?