CoreNLP icon indicating copy to clipboard operation
CoreNLP copied to clipboard

Web gui gives different results to server api for enhanced++ dependencies

Open mercurial-moon opened this issue 1 year ago • 5 comments

I have an offline corenlp server running and also a local version of corenlp.run I sent a query from the web browser with the following options (parts-of-speech, named entities, dependency parse, constituency parse, lemmas, coreference)

The query is

Only the columns to be modified need be mentioned in the SET clause; columns not explicitly modified retain their previous values.

Output

image The word "modified" shows 3 dependencies nsubj:pass - columns mark - to aux:pass - be

if I sent the same command via server api like under

wget --post-data "Only the columns to be modified need be mentioned in the SET clause; columns not explicitly modified retain their previous values." "http://corenlp.com:9000/?properties={\"annotators\":\"tokenize,ssplit,pos,lemma,ner,parse,dcoref\", \"outputFormat\": \"xml\"}" -O corenlp1.xml

then I get the following xml file

<dependencies type="enhanced-plus-plus-dependencies">
          <dep type="root">
            <governor idx="0">ROOT</governor>
            <dependent idx="9">mentioned</dependent>
          </dep>
          <dep type="advmod">
            <governor idx="3">columns</governor>
            <dependent idx="1">Only</dependent>
          </dep>
          <dep type="det">
            <governor idx="3">columns</governor>
            <dependent idx="2">the</dependent>
          </dep>
          <dep type="nsubj:pass">
            <governor idx="9">mentioned</governor>
            <dependent idx="3">columns</dependent>
          </dep>
          <dep type="mark">
            <governor idx="6">modified</governor>
            <dependent idx="4">to</dependent>
          </dep>
          <dep type="aux:pass">
            <governor idx="6">modified</governor>
            <dependent idx="5">be</dependent>
          </dep>
          <dep type="acl:to">
            <governor idx="3">columns</governor>
            <dependent idx="6">modified</dependent>
          </dep>
          <dep type="aux">
            <governor idx="9">mentioned</governor>
            <dependent idx="7">need</dependent>
          </dep>
          <dep type="aux:pass">
            <governor idx="9">mentioned</governor>
            <dependent idx="8">be</dependent>
          </dep>
          <dep type="case">
            <governor idx="13">clause</governor>
            <dependent idx="10">in</dependent>
          </dep>
          <dep type="det">
            <governor idx="13">clause</governor>
            <dependent idx="11">the</dependent>
          </dep>
          <dep type="compound">
            <governor idx="13">clause</governor>
            <dependent idx="12">SET</dependent>
          </dep>
          <dep type="obl:in">
            <governor idx="9">mentioned</governor>
            <dependent idx="13">clause</dependent>
          </dep>
          <dep type="punct">
            <governor idx="9">mentioned</governor>
            <dependent idx="14">;</dependent>
          </dep>
          <dep type="nsubj">
            <governor idx="19">retain</governor>
            <dependent idx="15">columns</dependent>
          </dep>
          <dep type="advmod">
            <governor idx="18">modified</governor>
            <dependent idx="16">not</dependent>
          </dep>
          <dep type="advmod">
            <governor idx="18">modified</governor>
            <dependent idx="17">explicitly</dependent>
          </dep>
          <dep type="acl">
            <governor idx="15">columns</governor>
            <dependent idx="18">modified</dependent>
          </dep>
          <dep type="parataxis">
            <governor idx="9">mentioned</governor>
            <dependent idx="19">retain</dependent>
          </dep>
          <dep type="nmod:poss">
            <governor idx="22">values</governor>
            <dependent idx="20">their</dependent>
          </dep>
          <dep type="amod">
            <governor idx="22">values</governor>
            <dependent idx="21">previous</dependent>
          </dep>
          <dep type="obj">
            <governor idx="19">retain</governor>
            <dependent idx="22">values</dependent>
          </dep>
          <dep type="punct">
            <governor idx="9">mentioned</governor>
            <dependent idx="23">.</dependent>
          </dep>
        </dependencies>

Here 2 of the 3 dependencies show correct but 1 of them is wrongly showing governor as "mentioned" instead of "modified"

          <dep type="nsubj:pass">
            <governor idx="9">mentioned</governor>
            <dependent idx="3">columns</dependent>
          </dep>
          <dep type="mark">
            <governor idx="6">modified</governor>
            <dependent idx="4">to</dependent>
          </dep>
          <dep type="aux:pass">
            <governor idx="6">modified</governor>
            <dependent idx="5">be</dependent>
          </dep>

There are some other kinds of mismatches that also show up eg. for the sentence "The crying cat is mine." the web gui shows dependency type for "crying" as "amod" while the api shows it as "compound" for the governor "cat".

mercurial-moon avatar Oct 18 '24 06:10 mercurial-moon

corenlp.run is a slightly out of date version :shrug:

On Thu, Oct 17, 2024 at 11:09 PM Mercurial @.***> wrote:

I have an offline corenlp server running and also a local version of corenlp.run I sent a query from the web browser with the following options (parts-of-speech, named entities, dependency parse, constituency parse, lemmas, coreference)

The query is

Only the columns to be modified need be mentioned in the SET clause; columns not explicitly modified retain their previous values.

Output

image.png (view on web) https://github.com/user-attachments/assets/1f384568-df86-4f68-b478-c393719720a0 The word "modified" shows 3 dependencies nsubj:pass - columns mark - to aux:pass - be

if I sent the same command via server api like under

wget --post-data "Only the columns to be modified need be mentioned in the SET clause; columns not explicitly modified retain their previous values." "http://corenlp.com:9000/?properties={"annotators":"tokenize,ssplit,pos,lemma,ner,parse,dcoref", "outputFormat": "xml"}" -O corenlp1.xml

then I get the following xml file

ROOT mentioned columns Only columns the mentioned columns modified to modified be columns modified mentioned need mentioned be clause in clause the clause SET mentioned clause mentioned ; retain columns modified not modified explicitly columns modified mentioned retain values their values previous retain values mentioned .

Here 2 of the 3 dependencies show correct but 1 of them is wrongly showing governor as "mentioned" instead of "modified"

      <dep type="nsubj:pass">
        <governor idx="9">mentioned</governor>
        <dependent idx="3">columns</dependent>
      </dep>
      <dep type="mark">
        <governor idx="6">modified</governor>
        <dependent idx="4">to</dependent>
      </dep>
      <dep type="aux:pass">
        <governor idx="6">modified</governor>
        <dependent idx="5">be</dependent>
      </dep>

There are some other kinds of mismatches that also show up eg. for the sentence "The crying cat is mine." the web gui shows dependency type for "crying" as "amod" while the api shows it as "compound" for the governor "cat".

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1467, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWOLPFKXMKSTPFUAE6DZ4CQZLAVCNFSM6AAAAABQFGXVFSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4TMNJSGA3DONQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

AngledLuffa avatar Oct 18 '24 06:10 AngledLuffa

Hi, this is an offline corenlp that I'm running locally. Please refer to issue https://github.com/stanfordnlp/CoreNLP/issues/1356 As I said I'm running the same query on the same server. 1 via webgui (local) and 1 via api. Both are being run on the same local server. But the xml output from the api seems to give different result to the visual result in the web gui. My core nlp version is 4.5.3

mercurial-moon avatar Oct 18 '24 06:10 mercurial-moon

Gotcha. I'll stop being dismissive and take a look tomorrow

On Thu, Oct 17, 2024 at 11:47 PM Mercurial @.***> wrote:

Hi, this is an offline corenlp that I'm running locally. Please refer to issue #1356 https://github.com/stanfordnlp/CoreNLP/issues/1356 As I said I'm running the same query on the same server. 1 via webgui (local) and 1 via api. Both are being run on the same local server. But the xml output from the api seems to give different result to the visual result in the web gui. My core nlp version is 4.5.3

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1467#issuecomment-2421563253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWNK3JVPBBLXNZSNKADZ4CVG7AVCNFSM6AAAAABQFGXVFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRRGU3DGMRVGM . You are receiving this because you commented.Message ID: @.***>

AngledLuffa avatar Oct 18 '24 06:10 AngledLuffa

Hi, any updates...

mercurial-moon avatar Oct 19 '24 12:10 mercurial-moon

Not yet. Please be patient - there is only one of me doing any debugging

On Sat, Oct 19, 2024, 5:40 AM Mercurial @.***> wrote:

Hi, any updates...

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1467#issuecomment-2423821785, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWNXCT3VIZ5CNXZG7G3Z4JHMHAVCNFSM6AAAAABQFGXVFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTHAZDCNZYGU . You are receiving this because you commented.Message ID: @.***>

AngledLuffa avatar Oct 19 '24 15:10 AngledLuffa

Looking this over - is this a case of the server has the direct to dependencies parser enabled by default, but the wget command you used is asking for the "parse" annotator, which means it parses to constituencies and then converts that to dependencies?

On Sat, Oct 19, 2024 at 8:25 AM John Bauer @.***> wrote:

Not yet. Please be patient - there is only one of me doing any debugging

On Sat, Oct 19, 2024, 5:40 AM Mercurial @.***> wrote:

Hi, any updates...

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1467#issuecomment-2423821785, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWNXCT3VIZ5CNXZG7G3Z4JHMHAVCNFSM6AAAAABQFGXVFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTHAZDCNZYGU . You are receiving this because you commented.Message ID: @.***>

AngledLuffa avatar Oct 22 '24 02:10 AngledLuffa

Not sure what that means... but I run the server using a batch file that has the following command in it

start "" java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

mercurial-moon avatar Oct 22 '24 04:10 mercurial-moon

When you do, what's the debugging out for the first query? I suspect it will have "depparse" instead of just "parse" in it

On Mon, Oct 21, 2024, 9:41 PM Mercurial @.***> wrote:

Not sure what that means... but I run the server using a batch file that has the following command in it

start "" java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1467#issuecomment-2428236994, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWPBVVNMXIRUKXS2NKTZ4XJQ3AVCNFSM6AAAAABQFGXVFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRYGIZTMOJZGQ . You are receiving this because you commented.Message ID: @.***>

AngledLuffa avatar Oct 22 '24 04:10 AngledLuffa

From my program I send the following annotators to the api

"tokenize", "ssplit", "pos", "lemma", "ner", "parse", "dcoref"

and

outputFormat: xml

mercurial-moon avatar Oct 22 '24 05:10 mercurial-moon

Debug of first run

[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called ---
[main] INFO CoreNLP - Server default properties:
                        (Note: unspecified annotator properties are English defaults)
                        inputFormat = text
                        outputFormat = json
                        prettyPrint = false
[main] INFO CoreNLP - Threads: 6
[main] INFO CoreNLP - Starting server...
[main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000
[pool-1-thread-1] INFO CoreNLP - [/0.0.0.0:1116] API call w/annotators tokenize,pos,lemma,ner,parse,dcoref
Only the columns to be modified need be mentioned in the SET clause; columns not explicitly modified retain their previous values.
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[pool-1-thread-1] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [1.4 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [2.0 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.5 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.7 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
[pool-1-thread-1] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580705 unique entries out of 581864 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns.
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4867 unique entries out of 4867 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns.
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585572 unique entries from 2 files
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.NERCombinerAnnotator - numeric classifiers: true; SUTime: true [no docDate]; fine grained: true
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[pool-1-thread-1] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.6 sec].
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator dcoref
[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Using mention detector type: dependency

mercurial-moon avatar Oct 22 '24 05:10 mercurial-moon

If I understand correctly, the problem is that the web interface is returning different results, right? So when you go to the web interface, it might say something like "parts-of-speech, named entities, dependency parse" in the "Annotations" field. That specific list means it uses the direct to dependencies parser. The command line you're giving using the constituency parser, then converts those to dependencies.

If you want more accurate dependencies, you probably want the depparse annotator for CoreNLP. Or there's Stanza, which has more accurate dependencies using a transformer (although it'll be slower and won't have the server feature)

On Mon, Oct 21, 2024 at 10:24 PM Mercurial @.***> wrote:

Debug of first run

[main] INFO CoreNLP - --- StanfordCoreNLPServer#main() called --- [main] INFO CoreNLP - Server default properties: (Note: unspecified annotator properties are English defaults) inputFormat = text outputFormat = json prettyPrint = false [main] INFO CoreNLP - Threads: 6 [main] INFO CoreNLP - Starting server... [main] INFO CoreNLP - StanfordCoreNLPServer listening at /0:0:0:0:0:0:0:0:9000 [pool-1-thread-1] INFO CoreNLP - [/0.0.0.0:1116] API call w/annotators tokenize,pos,lemma,ner,parse,dcoref Only the columns to be modified need be mentioned in the SET clause; columns not explicitly modified retain their previous values. [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos [pool-1-thread-1] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [1.4 sec]. [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner [pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [2.0 sec]. [pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.5 sec]. [pool-1-thread-1] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.7 sec]. [pool-1-thread-1] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1. [pool-1-thread-1] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580705 unique entries out of 581864 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns. [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4867 unique entries out of 4867 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns. [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585572 unique entries from 2 files [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.NERCombinerAnnotator - numeric classifiers: true; SUTime: true [no docDate]; fine grained: true [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse [pool-1-thread-1] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.6 sec]. [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator dcoref [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Using mention detector type: dependency

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1467#issuecomment-2428286012, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWMEACFG2357CRRGEJ3Z4XOSHAVCNFSM6AAAAABQFGXVFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRYGI4DMMBRGI . You are receiving this because you commented.Message ID: @.***>

AngledLuffa avatar Oct 22 '24 08:10 AngledLuffa

If I understand correctly, the problem is that the web interface is returning different results, right?

Yes, different results from web interface and server api (xml output) although both are querying the same server. I even checked the json output even those differ between server api and web interface results. Next I intend to check the protobuf version to see if that matches the web gui.

That specific list means it uses the direct to dependencies parser. The command line you're giving using the constituency parser, then converts those to dependencies.

so is there a difference in the way the web gui queries the server than the api method? Does the web interface use xml output from the server.

The reason all this matters is non-programmers are using the web interface and for other automated tasks the api interface is being used then a difference is being found in both the results.

If you want more accurate dependencies, you probably want the depparse annotator for CoreNLP.

Yes accuracy certainly matters but currently the mismatch in web interface and api is more of a concern.

mercurial-moon avatar Oct 22 '24 08:10 mercurial-moon

If you want more accurate dependencies, you probably want the depparse annotator for CoreNLP.

That seems to fix it. I added the depparse annotator to the list of my existing one's.

The annotators list is now

"tokenize", "ssplit", "pos", "lemma", "ner", "parse", "depparse" , "dcoref"

Now the web interface results match with api generated results.

Many thanks for the support. Much appreciated!

mercurial-moon avatar Oct 22 '24 09:10 mercurial-moon

Sounds good - just be aware that you may not need parse if all you want is dependencies (depparse)

On Tue, Oct 22, 2024 at 2:04 AM Mercurial @.***> wrote:

If you want more accurate dependencies, you probably want the depparse annotator for CoreNLP.

That seems to fix it. I added the depparse annotator to the list of my existing one's.

The annotators list is now

"tokenize", "ssplit", "pos", "lemma", "ner", "parse", "depparse" , "dcoref"

Now the web interface results match with api generated results.

Many thanks for the support. Much appreciated!

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1467#issuecomment-2428699960, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWIZP6V6ZBN7LP5EYSLZ4YIKNAVCNFSM6AAAAABQFGXVFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRYGY4TSOJWGA . You are receiving this because you commented.Message ID: @.***>

AngledLuffa avatar Oct 22 '24 09:10 AngledLuffa