stanza
stanza copied to clipboard
rose lemma problem
Describe the bug The lemma of rose(rose flower) is rise in 1.4.0
To Reproduce Steps to reproduce the behavior: Take the sentence "I gave her a rose" as example, the POS of rose is right, which is NN, but the lemma of it is rise in 1.4.0
Expected behavior The lemma of rose should be rose not rise, this behavior is normal in 1.3.0
Environment
- OS: Debian
- Python version: 3.8.10 from miniconda
- Stanza version: 1.4.0
The number one limitation to me debugging this is that I've never touched the lemmatizer whatsoever, but perhaps it will be as simple as retraining with a couple extra sentences.
@AngledLuffa the first thing to check is probably whether we have (NN, "rose") -> "rose" in the UD training set. If not, adding one to the training data would help (the lexicon-based part should be able to catch and remember this).
It is there several times, unfortunately, so the fix won't be that easy
On Wed, Jul 20, 2022 at 9:34 PM Peng Qi @.***> wrote:
@AngledLuffa https://github.com/AngledLuffa the first thing to check is probably whether we have (NN, "rose") -> "rose" in the UD training set. If not, adding one to the training data would help (the lexicon-based part should be able to catch and remember this).
— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1084#issuecomment-1191027495, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWM5JVYKIXCT3EYAYFLVVDHNNANCNFSM54B3G2XQ . You are receiving this because you were mentioned.Message ID: @.***>
Fortunately, retraining the models is enough to get it to work for "rose". I didn't do anything special, which makes me wonder if another retraining later will flip the switch back to "not working". At any rate, you can install the dev branch or wait a week or two for v1.4.1 and get the updated lemma model.
On Wed, Jul 20, 2022 at 9:59 PM John Bauer @.***> wrote:
It is there several times, unfortunately, so the fix won't be that easy
On Wed, Jul 20, 2022 at 9:34 PM Peng Qi @.***> wrote:
@AngledLuffa https://github.com/AngledLuffa the first thing to check is probably whether we have (NN, "rose") -> "rose" in the UD training set. If not, adding one to the training data would help (the lexicon-based part should be able to catch and remember this).
— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1084#issuecomment-1191027495, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWM5JVYKIXCT3EYAYFLVVDHNNANCNFSM54B3G2XQ . You are receiving this because you were mentioned.Message ID: @.***>