stanza icon indicating copy to clipboard operation
stanza copied to clipboard

rose lemma problem

Open ErwinLiYH opened this issue 3 years ago • 4 comments
trafficstars

Describe the bug The lemma of rose(rose flower) is rise in 1.4.0

To Reproduce Steps to reproduce the behavior: Take the sentence "I gave her a rose" as example, the POS of rose is right, which is NN, but the lemma of it is rise in 1.4.0

Expected behavior The lemma of rose should be rose not rise, this behavior is normal in 1.3.0

Environment

  • OS: Debian
  • Python version: 3.8.10 from miniconda
  • Stanza version: 1.4.0

ErwinLiYH avatar Jul 20 '22 01:07 ErwinLiYH

The number one limitation to me debugging this is that I've never touched the lemmatizer whatsoever, but perhaps it will be as simple as retraining with a couple extra sentences.

AngledLuffa avatar Jul 21 '22 00:07 AngledLuffa

@AngledLuffa the first thing to check is probably whether we have (NN, "rose") -> "rose" in the UD training set. If not, adding one to the training data would help (the lexicon-based part should be able to catch and remember this).

qipeng avatar Jul 21 '22 04:07 qipeng

It is there several times, unfortunately, so the fix won't be that easy

On Wed, Jul 20, 2022 at 9:34 PM Peng Qi @.***> wrote:

@AngledLuffa https://github.com/AngledLuffa the first thing to check is probably whether we have (NN, "rose") -> "rose" in the UD training set. If not, adding one to the training data would help (the lexicon-based part should be able to catch and remember this).

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1084#issuecomment-1191027495, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWM5JVYKIXCT3EYAYFLVVDHNNANCNFSM54B3G2XQ . You are receiving this because you were mentioned.Message ID: @.***>

AngledLuffa avatar Jul 21 '22 04:07 AngledLuffa

Fortunately, retraining the models is enough to get it to work for "rose". I didn't do anything special, which makes me wonder if another retraining later will flip the switch back to "not working". At any rate, you can install the dev branch or wait a week or two for v1.4.1 and get the updated lemma model.

On Wed, Jul 20, 2022 at 9:59 PM John Bauer @.***> wrote:

It is there several times, unfortunately, so the fix won't be that easy

On Wed, Jul 20, 2022 at 9:34 PM Peng Qi @.***> wrote:

@AngledLuffa https://github.com/AngledLuffa the first thing to check is probably whether we have (NN, "rose") -> "rose" in the UD training set. If not, adding one to the training data would help (the lexicon-based part should be able to catch and remember this).

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1084#issuecomment-1191027495, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWM5JVYKIXCT3EYAYFLVVDHNNANCNFSM54B3G2XQ . You are receiving this because you were mentioned.Message ID: @.***>

AngledLuffa avatar Jul 21 '22 20:07 AngledLuffa