nlp4j EnglishC2DConverter generates ArrayIndexOutOfBoundsException

I am unable to get the EnglishC2DConverter working. The following lines reproduce the problem.

        // This is an example from "src/test/resources/constituent/functionTags.parse"
        String pennTree = "(TOP (S (NP-SBJ (NP (CC both) (NNP Bush) (CC and) (NNP Rice))) (VP "
                + "(VBP have) (VP (VBN delivered) (NP (NP (NNS speeches)) (, ,) (SBAR (WHNP-1 "
                + "(WDT which)) (S (NP-SBJ (-NONE- *T*-1)) (VP (VBP are) (ADJP-PRD (RB very) "
                + "(JJ clear))))))))))";
        
        CTReader reader = new CTReader();
        reader.open(new StringInputStream(pennTree), "UTF-8");
        List<CTTree> trees = reader.getTreeList();
        System.out.println(trees);
        
        C2DConverter c2d = new EnglishC2DConverter(new HeadRuleMap(getClass()
                .getResourceAsStream("/edu/emory/mathcs/nlp/conversion/headrule_en_stanford.txt")));
        
        c2d.toDependencyGraph(trees.get(0));

This code prints an ArrayIndexOutOfBoundsException:

java.lang.ArrayIndexOutOfBoundsException: 12
	at edu.emory.mathcs.nlp.conversion.C2DConverter.initDEPTree(C2DConverter.java:195)
	at edu.emory.mathcs.nlp.conversion.EnglishC2DConverter.getDEPTree(EnglishC2DConverter.java:1129)
	at edu.emory.mathcs.nlp.conversion.EnglishC2DConverter.toDependencyGraph(EnglishC2DConverter.java:136)

Is the converter broken?

As a side note, the converter "swallows" the generated exception in EnglishC2DConverter.toDependencyGraph(CTTree) - i.e. it prints a stack trace and doesn't forward the exception to the calling code. IMHO it would be better if it did not print the stack trace and instead let the exception bubble up to the calling code.

Jan 15 '17 12:01 reckart

Richard,

It's because we were in the middle of fixing the code. I temporarily made a separate project to provide the conversion script so please take a look:

http://github.com/emorynlp/ddr

Please let me know if you have more questions. Thanks.

Jan 16 '17 12:01 jdchoi77

I stumbled across this C2DConverter when I was looking where the semantic and secondary heads in the NLPNode come from - and then I was looking into wrapping that conversion as a new NLP4J UIMA component in DKPro Core. However, if it is known to be broken in the last NLP4J release, then I'd better wait for the next NLP4J release on Maven Central which then probably (?) will also include the new converter?

Jan 16 '17 14:01 reckart

nlp4j nlp4j copied to clipboard

EnglishC2DConverter generates ArrayIndexOutOfBoundsException

nlp4j
nlp4j copied to clipboard