stanza icon indicating copy to clipboard operation
stanza copied to clipboard

How to get the phrase from the node in the parse_tree.Tree?

Open yxKryptonite opened this issue 3 years ago • 7 comments

Hello! I'm using the constituency parsing of Stanza. When I get a node in the parse_tree.Tree, how can I get the phrase under the node? There seems to be no such apis to get the text information. Thanks for any reply!

yxKryptonite avatar Aug 14 '22 05:08 yxKryptonite

tree.leaf_labels()

AngledLuffa avatar Aug 14 '22 06:08 AngledLuffa

Thank you very much!

yxKryptonite avatar Aug 14 '22 11:08 yxKryptonite

By the way, where is the full document of Stanza? It seems that I couldn't find a very detailed api instruction on https://stanfordnlp.github.io/stanza/

yxKryptonite avatar Aug 14 '22 11:08 yxKryptonite

There's a bunch of pydoc in the module, but we haven't actually built it and put it anywhere. You can look through it yourself with the pydoc module or by browsing the source tree. I can look for a way to add it to our github.io, but no promises that I find anything

AngledLuffa avatar Aug 14 '22 13:08 AngledLuffa

Another thing you can do is, from within a python shell, run help(tree) to see if there's any help for that specific object, with the help generated from the pydoc.

help(doc.sentences[0].constituency)

Or you can run

python -m pydoc -b

AngledLuffa avatar Aug 14 '22 13:08 AngledLuffa

Thank you very much! I will refer to the pydoc often!

And I have two more little questions:

  1. Why every time I ran nlp = stanza.Pipeline('en', processors='tokenize,pos,constituency', use_gpu=False), the file resources_1.4.0.json was downloaded again and again? Is there any solution to avoid this?
  2. Can I change the characteristic of a certain word? For example, "back" is a noun word (NP) when in furnitures (e.g. chairs), but in Stanza I got a preposition (PP).

Thanks for any reply!

yxKryptonite avatar Aug 14 '22 14:08 yxKryptonite

I'm actually not sure I buy that. Is back of the bus a noun? Although a person's back is probably a noun... maybe chair is closer to that than to the bus example.

Anyway, you can compile a list of examples and we'll incorporate some into the training data. You could also manually intervene between tagging and parsing, but that might be a bit of a headache.

If you're using this a lot, we have more accurate models coming in 1.4.1. You could use the dev branch or I could put together a preliminary release later this week

On Sun, Aug 14, 2022, 7:11 AM Yuxuan Kuang @.***> wrote:

Thank you very much! I will refer to the pydoc often!

And I have two more little questions:

  1. Why every time I ran nlp = stanza.Pipeline('en', processors='tokenize,pos,constituency', use_gpu=False), the file resources_1.4.0.json was downloaded again and again? Is there any solution to avoid this?
  2. Can I change the characteristic of a certain word? For example, "back" is a noun word (NP) when in furnitures (e.g. chairs), but in Stanza I got a preposition (PP).

Thanks for any reply!

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1096#issuecomment-1214386230, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWOBVSJ4PIXFOTBD6U3VZD5CHANCNFSM56PHZN7Q . You are receiving this because you commented.Message ID: @.***>

AngledLuffa avatar Aug 14 '22 19:08 AngledLuffa

Oh yeah:

resources was downloaded again and again? Is there any solution to avoid this?

sure, set download_method=None when creating the Pipeline

AngledLuffa avatar Aug 14 '22 21:08 AngledLuffa

One last note - it appears the dev branch and its more accurate POS tagger tags back as a noun in the sentence "The back of the chair is wood"

AngledLuffa avatar Aug 15 '22 02:08 AngledLuffa