CoreNLP icon indicating copy to clipboard operation
CoreNLP copied to clipboard

Shift Reduce Parser builds a wrong dependency tree

Open mkarmona opened this issue 3 years ago • 1 comments

Given the sentence Aspirin causes bleeding and Ibuprofen reduces inflammation., the new SR parser (the suggested) builds the wrong result. The former parser (default one) builds it right.

Code to reproduce it (version 4.4.0)

import java.util.Properties
import edu.stanford.nlp.pipeline._
import edu.stanford.nlp.semgraph._
import edu.stanford.nlp.semgraph.semgrex._
import scala.collection.JavaConverters._

with the SR parser

case object CNLPFullSR {
  lazy val value: StanfordCoreNLP = {
    val props = new Properties()
    props.setProperty(
      "annotators",
      "tokenize, ssplit, pos, lemma, parse"
    )
    props.setProperty("tokenize.options", "splitHyphenated=false")
    props.setProperty("parse.originalDependencies", "true")
    props.setProperty("parse.extradependencies", "MAXIMAL")
    props.setProperty(
      "parse.model",
      "edu/stanford/nlp/models/srparser/englishSR.ser.gz"
    )

    val pipeline = new StanfordCoreNLP(props)

    pipeline
  }
}

val text = "Aspirin causes bleeding and Ibuprofen reduces inflammation."
val doc = new CoreDocument(text)
CNLPFullSR.pipeline.annotate(doc)
val sentence = doc.sentences().get(0)
@ sentence.dependencyParse
res22: SemanticGraph = -> causes/VBZ (root)
  -> Aspirin/NN (nsubj)
  -> reduces/VBZ (ccomp)
    -> bleeding/NN (nsubj)
      -> and/CC (cc)
      -> Ibuprofen/NNP (conj)
    -> inflammation/NN (dobj)
  -> ./. (punct)

and if we remove the props for the model and try it again

case object CNLPFullDEF {
  lazy val value: StanfordCoreNLP = {
    val props = new Properties()
    props.setProperty(
      "annotators",
      "tokenize, ssplit, pos, lemma, parse"
    )
    props.setProperty("tokenize.options", "splitHyphenated=false")
    props.setProperty("parse.originalDependencies", "true")
    props.setProperty("parse.extradependencies", "MAXIMAL")

    val pipeline = new StanfordCoreNLP(props)

    pipeline
  }
}

val text = "Aspirin causes bleeding and Ibuprofen reduces inflammation."
val doc = new CoreDocument(text)
CNLPFullDEF.pipeline.annotate(doc)
val sentence = doc.sentences().get(0)
@ sentence.dependencyParse
res28: SemanticGraph = -> causes/VBZ (root)
  -> Aspirin/NN (nsubj)
  -> bleeding/NN (dobj)
  -> and/CC (cc)
  -> reduces/VBZ (conj)
    -> Ibuprofen/NNP (nsubj)
    -> inflammation/NN (dobj)
  -> ./. (punct)

If more checks are needed happy to help.

mkarmona avatar May 24 '22 15:05 mkarmona

I agree that the PCFG parser is getting this one right and the SR is not, but ultimately the PCFG has an F1 of 87 and the SR, 90, and their errors will be like a box of chocolates. Or something like that, it's been a while since I've watched that movie.

I was going to recommend stanza for depparse, but it doesn't seem to do well with this particular sentence either. The stanza constituency parser is correct, at least.

One thing we can do is add this exact example to the training data and rebuild the models. If you come up with other examples where it gets the wrong parse, please let us know

On Tue, May 24, 2022 at 8:33 AM Miguel Carmona @.***> wrote:

Given the sentence Aspirin causes bleeding and Ibuprofen reduces inflammation., the new SR parser (the suggested) builds the wrong result. The former parser (default one) builds it right.

Code to reproduce it (version 4.4.0)

import java.util.Propertiesimport edu.stanford.nlp.pipeline._import edu.stanford.nlp.semgraph._import edu.stanford.nlp.semgraph.semgrex.import scala.collection.JavaConverters.

with the SR parser

case object CNLPFullSR { lazy val value: StanfordCoreNLP = { val props = new Properties() props.setProperty( "annotators", "tokenize, ssplit, pos, lemma, parse" ) props.setProperty("tokenize.options", "splitHyphenated=false") props.setProperty("parse.originalDependencies", "true") props.setProperty("parse.extradependencies", "MAXIMAL") props.setProperty( "parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz" )

val pipeline = new StanfordCoreNLP(props)

pipeline

} } val text = "Aspirin causes bleeding and Ibuprofen reduces inflammation."val doc = new CoreDocument(text)CNLPFullSR.pipeline.annotate(doc)val sentence = doc.sentences().get(0)@ sentence.dependencyParse res22: SemanticGraph = -> causes/VBZ (root) -> Aspirin/NN (nsubj) -> reduces/VBZ (ccomp) -> bleeding/NN (nsubj) -> and/CC (cc) -> Ibuprofen/NNP (conj) -> inflammation/NN (dobj) -> ./. (punct)

and if we remove the props for the model and try it again

case object CNLPFullDEF { lazy val value: StanfordCoreNLP = { val props = new Properties() props.setProperty( "annotators", "tokenize, ssplit, pos, lemma, parse" ) props.setProperty("tokenize.options", "splitHyphenated=false") props.setProperty("parse.originalDependencies", "true") props.setProperty("parse.extradependencies", "MAXIMAL") props.setProperty( "parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz" )

val pipeline = new StanfordCoreNLP(props)

pipeline

} } val text = "Aspirin causes bleeding and Ibuprofen reduces inflammation."val doc = new CoreDocument(text)CNLPFullDEF.pipeline.annotate(doc)val sentence = doc.sentences().get(0)@ sentence.dependencyParse res28: SemanticGraph = -> causes/VBZ (root) -> Aspirin/NN (nsubj) -> bleeding/NN (dobj) -> and/CC (cc) -> reduces/VBZ (conj) -> Ibuprofen/NNP (nsubj) -> inflammation/NN (dobj) -> ./. (punct)

If more checks are needed happy to help.

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1271, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWINYPE4BRQWTX7HHL3VLTZGNANCNFSM5WZ74D4Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>

AngledLuffa avatar May 24 '22 23:05 AngledLuffa