nltk icon indicating copy to clipboard operation
nltk copied to clipboard

Output is "Stag" instead of "Stage"

Open ravishankar-cloud opened this issue 1 year ago • 4 comments

Kindly rectify in wordnet database on below item.

from nltk.stem import WordNetLemmatizer ps=WordNetLemmatizer() ps.lemmatize('staging',pos='v')

output: stag

Expected output: stage

Checked the wordnet database and stag is declared as version1

from nltk.corpus import wordnet as wn

Ensure you have downloaded the WordNet dataset

import nltk nltk.download('wordnet')

Look up "staging" in WordNet

synsets = wn.synsets('staging')

Print the synsets and their definitions

for synset in synsets: print(f"{synset.name()}: {synset.definition()}")

theatrical_production.n.01: the production of a drama on the stage scaffolding.n.01: a system of scaffolds staging.n.03: travel by stagecoach staging.n.04: getting rid of a stage of a multistage rocket stage.v.01: perform (a play), especially on a stage stage.v.02: plan, organize, and carry out (an event) stag.v.01: attend a dance or a party without a female companion denounce.v.04: give away information about somebody spy.v.02: watch, observe, or inquire secretly [nltk_data] Downloading package wordnet to /root/nltk_data... [nltk_data] Package wordnet is already up-to-date!

In spacy we get correct output: import spacy

Load the English model

nlp = spacy.load('en_core_web_sm')

Process the text

doc = nlp("staging")

Get the lemma

lemma = doc[0].lemma_ print(lemma) # Should output 'stage'

Output: stage

ravishankar-cloud avatar Sep 20 '24 07:09 ravishankar-cloud

The present participle of to stag is stagging, so the problem is in the morphological processing, and not in the database. Other lemmatizers are available:

from nltk.stem import WordNetLemmatizer as wnl
print(wnl().morphy("staging", pos="v"))

stage

ekaf avatar Sep 20 '24 22:09 ekaf

None of the WordNetLemmatizer functions is able to lemmatize stagging:

from nltk.stem import WordNetLemmatizer as wnl
print(wnl().lemmatize("stagging", pos="v"))

stagging

print(wnl().morphy("stagging", pos="v")) None

print(wnl()._morphy("stagging", pos="v")) []

Here, the solution would be to add a line in the exceptions file verb.exc, in analogy with tagging -> tag.

ekaf avatar Sep 21 '24 08:09 ekaf

Hi! I will take care of this. Could you assign this issue to me? @ravishankar-cloud

pedroborgescruz avatar Oct 01 '24 14:10 pedroborgescruz

Thanks @pedroborgescruz! If you plan to modify the database, it would be a data issue rather than an nltk issue. Then you should consider submitting your solution to the relevant Wordnet project instead.

ekaf avatar Oct 03 '24 13:10 ekaf