stanza
stanza copied to clipboard
How to get the phrase from the node in the parse_tree.Tree?
Hello! I'm using the constituency parsing of Stanza. When I get a node in the parse_tree.Tree, how can I get the phrase under the node? There seems to be no such apis to get the text information. Thanks for any reply!
tree.leaf_labels()
Thank you very much!
By the way, where is the full document of Stanza? It seems that I couldn't find a very detailed api instruction on https://stanfordnlp.github.io/stanza/
There's a bunch of pydoc in the module, but we haven't actually built it and put it anywhere. You can look through it yourself with the pydoc module or by browsing the source tree. I can look for a way to add it to our github.io, but no promises that I find anything
Another thing you can do is, from within a python shell, run help(tree) to see if there's any help for that specific object, with the help generated from the pydoc.
help(doc.sentences[0].constituency)
Or you can run
python -m pydoc -b
Thank you very much! I will refer to the pydoc often!
And I have two more little questions:
- Why every time I ran
nlp = stanza.Pipeline('en', processors='tokenize,pos,constituency', use_gpu=False), the fileresources_1.4.0.jsonwas downloaded again and again? Is there any solution to avoid this? - Can I change the characteristic of a certain word? For example, "back" is a noun word (NP) when in furnitures (e.g. chairs), but in Stanza I got a preposition (PP).
Thanks for any reply!
I'm actually not sure I buy that. Is back of the bus a noun? Although a person's back is probably a noun... maybe chair is closer to that than to the bus example.
Anyway, you can compile a list of examples and we'll incorporate some into the training data. You could also manually intervene between tagging and parsing, but that might be a bit of a headache.
If you're using this a lot, we have more accurate models coming in 1.4.1. You could use the dev branch or I could put together a preliminary release later this week
On Sun, Aug 14, 2022, 7:11 AM Yuxuan Kuang @.***> wrote:
Thank you very much! I will refer to the pydoc often!
And I have two more little questions:
- Why every time I ran nlp = stanza.Pipeline('en', processors='tokenize,pos,constituency', use_gpu=False), the file resources_1.4.0.json was downloaded again and again? Is there any solution to avoid this?
- Can I change the characteristic of a certain word? For example, "back" is a noun word (NP) when in furnitures (e.g. chairs), but in Stanza I got a preposition (PP).
Thanks for any reply!
— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1096#issuecomment-1214386230, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWOBVSJ4PIXFOTBD6U3VZD5CHANCNFSM56PHZN7Q . You are receiving this because you commented.Message ID: @.***>
Oh yeah:
resources was downloaded again and again? Is there any solution to avoid this?
sure, set download_method=None when creating the Pipeline
One last note - it appears the dev branch and its more accurate POS tagger tags back as a noun in the sentence "The back of the chair is wood"