CoreNLP
CoreNLP copied to clipboard
Shift Reduce Parser builds a wrong dependency tree
Given the sentence Aspirin causes bleeding and Ibuprofen reduces inflammation., the new SR parser (the suggested) builds the wrong result. The former parser (default one) builds it right.
Code to reproduce it (version 4.4.0)
import java.util.Properties
import edu.stanford.nlp.pipeline._
import edu.stanford.nlp.semgraph._
import edu.stanford.nlp.semgraph.semgrex._
import scala.collection.JavaConverters._
with the SR parser
case object CNLPFullSR {
lazy val value: StanfordCoreNLP = {
val props = new Properties()
props.setProperty(
"annotators",
"tokenize, ssplit, pos, lemma, parse"
)
props.setProperty("tokenize.options", "splitHyphenated=false")
props.setProperty("parse.originalDependencies", "true")
props.setProperty("parse.extradependencies", "MAXIMAL")
props.setProperty(
"parse.model",
"edu/stanford/nlp/models/srparser/englishSR.ser.gz"
)
val pipeline = new StanfordCoreNLP(props)
pipeline
}
}
val text = "Aspirin causes bleeding and Ibuprofen reduces inflammation."
val doc = new CoreDocument(text)
CNLPFullSR.pipeline.annotate(doc)
val sentence = doc.sentences().get(0)
@ sentence.dependencyParse
res22: SemanticGraph = -> causes/VBZ (root)
-> Aspirin/NN (nsubj)
-> reduces/VBZ (ccomp)
-> bleeding/NN (nsubj)
-> and/CC (cc)
-> Ibuprofen/NNP (conj)
-> inflammation/NN (dobj)
-> ./. (punct)
and if we remove the props for the model and try it again
case object CNLPFullDEF {
lazy val value: StanfordCoreNLP = {
val props = new Properties()
props.setProperty(
"annotators",
"tokenize, ssplit, pos, lemma, parse"
)
props.setProperty("tokenize.options", "splitHyphenated=false")
props.setProperty("parse.originalDependencies", "true")
props.setProperty("parse.extradependencies", "MAXIMAL")
val pipeline = new StanfordCoreNLP(props)
pipeline
}
}
val text = "Aspirin causes bleeding and Ibuprofen reduces inflammation."
val doc = new CoreDocument(text)
CNLPFullDEF.pipeline.annotate(doc)
val sentence = doc.sentences().get(0)
@ sentence.dependencyParse
res28: SemanticGraph = -> causes/VBZ (root)
-> Aspirin/NN (nsubj)
-> bleeding/NN (dobj)
-> and/CC (cc)
-> reduces/VBZ (conj)
-> Ibuprofen/NNP (nsubj)
-> inflammation/NN (dobj)
-> ./. (punct)
If more checks are needed happy to help.
I agree that the PCFG parser is getting this one right and the SR is not, but ultimately the PCFG has an F1 of 87 and the SR, 90, and their errors will be like a box of chocolates. Or something like that, it's been a while since I've watched that movie.
I was going to recommend stanza for depparse, but it doesn't seem to do well with this particular sentence either. The stanza constituency parser is correct, at least.
One thing we can do is add this exact example to the training data and rebuild the models. If you come up with other examples where it gets the wrong parse, please let us know
On Tue, May 24, 2022 at 8:33 AM Miguel Carmona @.***> wrote:
Given the sentence Aspirin causes bleeding and Ibuprofen reduces inflammation., the new SR parser (the suggested) builds the wrong result. The former parser (default one) builds it right.
Code to reproduce it (version 4.4.0)
import java.util.Propertiesimport edu.stanford.nlp.pipeline._import edu.stanford.nlp.semgraph._import edu.stanford.nlp.semgraph.semgrex.import scala.collection.JavaConverters.
with the SR parser
case object CNLPFullSR { lazy val value: StanfordCoreNLP = { val props = new Properties() props.setProperty( "annotators", "tokenize, ssplit, pos, lemma, parse" ) props.setProperty("tokenize.options", "splitHyphenated=false") props.setProperty("parse.originalDependencies", "true") props.setProperty("parse.extradependencies", "MAXIMAL") props.setProperty( "parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz" )
val pipeline = new StanfordCoreNLP(props) pipeline} } val text = "Aspirin causes bleeding and Ibuprofen reduces inflammation."val doc = new CoreDocument(text)CNLPFullSR.pipeline.annotate(doc)val sentence = doc.sentences().get(0)@ sentence.dependencyParse res22: SemanticGraph = -> causes/VBZ (root) -> Aspirin/NN (nsubj) -> reduces/VBZ (ccomp) -> bleeding/NN (nsubj) -> and/CC (cc) -> Ibuprofen/NNP (conj) -> inflammation/NN (dobj) -> ./. (punct)
and if we remove the props for the model and try it again
case object CNLPFullDEF { lazy val value: StanfordCoreNLP = { val props = new Properties() props.setProperty( "annotators", "tokenize, ssplit, pos, lemma, parse" ) props.setProperty("tokenize.options", "splitHyphenated=false") props.setProperty("parse.originalDependencies", "true") props.setProperty("parse.extradependencies", "MAXIMAL") props.setProperty( "parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz" )
val pipeline = new StanfordCoreNLP(props) pipeline} } val text = "Aspirin causes bleeding and Ibuprofen reduces inflammation."val doc = new CoreDocument(text)CNLPFullDEF.pipeline.annotate(doc)val sentence = doc.sentences().get(0)@ sentence.dependencyParse res28: SemanticGraph = -> causes/VBZ (root) -> Aspirin/NN (nsubj) -> bleeding/NN (dobj) -> and/CC (cc) -> reduces/VBZ (conj) -> Ibuprofen/NNP (nsubj) -> inflammation/NN (dobj) -> ./. (punct)
If more checks are needed happy to help.
— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1271, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWINYPE4BRQWTX7HHL3VLTZGNANCNFSM5WZ74D4Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>