CoreNLP
CoreNLP copied to clipboard
Question about constituency score
Hi there. My apologies for my background isn't in NLP so my question may not make much sense here. I'm trying to get the constituency score from a constituency parse out of my sentences. I was able to jerry-rig my way around this in Stanza by grabbing the source code and making my own way of retrieving it, however I was wondering if there's an out of the box way to do this with CoreNLP. Any information would be appreciated.
Thanks!
Do you need it to be from the command line, or from a Java function call?
If you want something from the command line, you can look at edu/stanford/nlp/parser/metrics/Evalb.java, although the main method is not well documented at all.
If you need a java function call, you can look at the EvaluateTreebank class, where you do something like
EvaluateTreebank evaluator = new EvaluateTreebank(....);
evaluator.testOnTreebank(devTreebank);
Again, though, this is somewhat poorly documented. You can follow along with shiftreduce/PerceptronModel or lexparser/LexicalizedParser, which both use that
On Thu, Aug 24, 2023 at 10:32 PM RLangridge @.***> wrote:
Hi there. My apologies for my background isn't in NLP so my question may not make much sense here. I'm trying to get the constituency score from a constituency parse out of my sentences. I was able to jerry-rig my way around this in Stanza by grabbing the source code and making my own way of retrieving it, however I was wondering if there's an out of the box way to do this with CoreNLP. Any information would be appreciated.
Thanks!
— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1384, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWJUMD6CS7PGSE2F3E3XXA2FLANCNFSM6AAAAAA357X63I . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks for this information. I'll have a look at both and work out which would be best.
Just to check, is there any way to implement this via the CoreNLPClient? That's what we're currently using at the moment.
Yes, look at
from stanza.server.parser_eval import EvaluateParser, ParseResult
such as what is used in
stanza/models/constituency/trainer.py
although it should be pointed out that this invokes a new Java process, rather than talking to a server, and therefore does not work from one machine to another, but rather has to have CoreNLP installed on the current machine you are on
On Sun, Aug 27, 2023 at 6:55 PM RLangridge @.***> wrote:
Just to check, is there any way to implement this via the CoreNLPClient? That's what we're currently using at the moment.
— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1384#issuecomment-1694798123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWIFTWB2YU2KEN3LOVTXXPM5LANCNFSM6AAAAAA357X63I . You are receiving this because you commented.Message ID: @.***>
Great thank you I'll give that a look
And sorry just one last question. I'm wanting to use constituency parsing on the following languages:
- Serbian
- Turkish
- Dutch
- Spanish
- Italian
- German I know that out of the box, some of these languages aren't supported by CoreNLP or are supported but don't have constituency parsing supported specifically. Is there any way to load in models from hugging face to perform constituency parsing or is it very much that for each language here that isn't supported, I'd need to look for a language specific project/library?
I spent an extensive amount of time searching for a constituency treebank in Dutch, but didn't come up with anything usable. If you have any suggestions, I will be happy to make such a parser model (probably for Stanza, whose constituency parser is much more accurate than CoreNLP's).
We have Italian in Stanza. We're working on updating those models with better versions - should be prepared in another week or two. There is VIT and TUT, two separate annotation schemes, one from Venice and one from Turin. (Figuring out which is which will be left as an exercise for the reader.)
For German, there was a guy working on a huge German constituency treebank, but they were uncomfortable sharing it for the purposes of releasing a publicly accessible model. CoreNLP has a model built from an early version of that work.
Spanish has a few usable treebanks, and in fact we have models built out of all of those treebanks put together in both CoreNLP and Stanza. We never split it up into individual pieces.
I do not believe a constituency treebank of any form exists for Serbian. If you know of such a thing, we'll be happy to make that parser as well.
For Turkish, a while back I found a company that translated some of PTB into Turkish and reparsed it. How useful that is on text that isn't translated English newswire, you'd have to tell me, but we actually do have a model for that in Stanza. Maybe I can revisit that group to see if they ever expanded that work at all.
On Sun, Aug 27, 2023 at 11:55 PM RLangridge @.***> wrote:
And sorry just one last question. I'm wanting to use constituency parsing on the following languages:
- Serbian
- Turkish
- Dutch
- Spanish
- Italian
- German I know that out of the box, some of these languages aren't supported by CoreNLP or are supported but don't have constituency parsing supported specifically. Is there any way to load in models from hugging face to perform constituency parsing or is it very much that for each language here that isn't supported, I'd need to look for a language specific project/library?
— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/CoreNLP/issues/1384#issuecomment-1695007794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWP7L5G2AUSRSNTRPJTXXQQD7ANCNFSM6AAAAAA357X63I . You are receiving this because you commented.Message ID: @.***>
Great, thank you so much for your time it's very much appreciated.