CoreNLP icon indicating copy to clipboard operation
CoreNLP copied to clipboard

Question about constituency score

Open RLangridge opened this issue 2 years ago • 8 comments

Hi there. My apologies for my background isn't in NLP so my question may not make much sense here. I'm trying to get the constituency score from a constituency parse out of my sentences. I was able to jerry-rig my way around this in Stanza by grabbing the source code and making my own way of retrieving it, however I was wondering if there's an out of the box way to do this with CoreNLP. Any information would be appreciated.

Thanks!

RLangridge avatar Aug 25 '23 05:08 RLangridge

Do you need it to be from the command line, or from a Java function call?

If you want something from the command line, you can look at edu/stanford/nlp/parser/metrics/Evalb.java, although the main method is not well documented at all.

If you need a java function call, you can look at the EvaluateTreebank class, where you do something like

EvaluateTreebank evaluator = new EvaluateTreebank(....);

evaluator.testOnTreebank(devTreebank);

Again, though, this is somewhat poorly documented. You can follow along with shiftreduce/PerceptronModel or lexparser/LexicalizedParser, which both use that

On Thu, Aug 24, 2023 at 10:32 PM RLangridge @.***> wrote:

Hi there. My apologies for my background isn't in NLP so my question may not make much sense here. I'm trying to get the constituency score from a constituency parse out of my sentences. I was able to jerry-rig my way around this in Stanza by grabbing the source code and making my own way of retrieving it, however I was wondering if there's an out of the box way to do this with CoreNLP. Any information would be appreciated.

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1384, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWJUMD6CS7PGSE2F3E3XXA2FLANCNFSM6AAAAAA357X63I . You are receiving this because you are subscribed to this thread.Message ID: @.***>

AngledLuffa avatar Aug 25 '23 06:08 AngledLuffa

Thanks for this information. I'll have a look at both and work out which would be best.

RLangridge avatar Aug 25 '23 06:08 RLangridge

Just to check, is there any way to implement this via the CoreNLPClient? That's what we're currently using at the moment.

RLangridge avatar Aug 27 '23 23:08 RLangridge

Yes, look at

from stanza.server.parser_eval import EvaluateParser, ParseResult

such as what is used in

stanza/models/constituency/trainer.py

although it should be pointed out that this invokes a new Java process, rather than talking to a server, and therefore does not work from one machine to another, but rather has to have CoreNLP installed on the current machine you are on

On Sun, Aug 27, 2023 at 6:55 PM RLangridge @.***> wrote:

Just to check, is there any way to implement this via the CoreNLPClient? That's what we're currently using at the moment.

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1384#issuecomment-1694798123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWIFTWB2YU2KEN3LOVTXXPM5LANCNFSM6AAAAAA357X63I . You are receiving this because you commented.Message ID: @.***>

AngledLuffa avatar Aug 27 '23 23:08 AngledLuffa

Great thank you I'll give that a look

RLangridge avatar Aug 27 '23 23:08 RLangridge

And sorry just one last question. I'm wanting to use constituency parsing on the following languages:

  • Serbian
  • Turkish
  • Dutch
  • Spanish
  • Italian
  • German I know that out of the box, some of these languages aren't supported by CoreNLP or are supported but don't have constituency parsing supported specifically. Is there any way to load in models from hugging face to perform constituency parsing or is it very much that for each language here that isn't supported, I'd need to look for a language specific project/library?

RLangridge avatar Aug 28 '23 04:08 RLangridge

I spent an extensive amount of time searching for a constituency treebank in Dutch, but didn't come up with anything usable. If you have any suggestions, I will be happy to make such a parser model (probably for Stanza, whose constituency parser is much more accurate than CoreNLP's).

We have Italian in Stanza. We're working on updating those models with better versions - should be prepared in another week or two. There is VIT and TUT, two separate annotation schemes, one from Venice and one from Turin. (Figuring out which is which will be left as an exercise for the reader.)

For German, there was a guy working on a huge German constituency treebank, but they were uncomfortable sharing it for the purposes of releasing a publicly accessible model. CoreNLP has a model built from an early version of that work.

Spanish has a few usable treebanks, and in fact we have models built out of all of those treebanks put together in both CoreNLP and Stanza. We never split it up into individual pieces.

I do not believe a constituency treebank of any form exists for Serbian. If you know of such a thing, we'll be happy to make that parser as well.

For Turkish, a while back I found a company that translated some of PTB into Turkish and reparsed it. How useful that is on text that isn't translated English newswire, you'd have to tell me, but we actually do have a model for that in Stanza. Maybe I can revisit that group to see if they ever expanded that work at all.

On Sun, Aug 27, 2023 at 11:55 PM RLangridge @.***> wrote:

And sorry just one last question. I'm wanting to use constituency parsing on the following languages:

  • Serbian
  • Turkish
  • Dutch
  • Spanish
  • Italian
  • German I know that out of the box, some of these languages aren't supported by CoreNLP or are supported but don't have constituency parsing supported specifically. Is there any way to load in models from hugging face to perform constituency parsing or is it very much that for each language here that isn't supported, I'd need to look for a language specific project/library?

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1384#issuecomment-1695007794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWP7L5G2AUSRSNTRPJTXXQQD7ANCNFSM6AAAAAA357X63I . You are receiving this because you commented.Message ID: @.***>

AngledLuffa avatar Aug 28 '23 05:08 AngledLuffa

Great, thank you so much for your time it's very much appreciated.

RLangridge avatar Aug 28 '23 23:08 RLangridge