nltk
nltk copied to clipboard
Output is "Stag" instead of "Stage"
Kindly rectify in wordnet database on below item.
from nltk.stem import WordNetLemmatizer ps=WordNetLemmatizer() ps.lemmatize('staging',pos='v')
output: stag
Expected output: stage
Checked the wordnet database and stag is declared as version1
from nltk.corpus import wordnet as wn
Ensure you have downloaded the WordNet dataset
import nltk nltk.download('wordnet')
Look up "staging" in WordNet
synsets = wn.synsets('staging')
Print the synsets and their definitions
for synset in synsets: print(f"{synset.name()}: {synset.definition()}")
theatrical_production.n.01: the production of a drama on the stage scaffolding.n.01: a system of scaffolds staging.n.03: travel by stagecoach staging.n.04: getting rid of a stage of a multistage rocket stage.v.01: perform (a play), especially on a stage stage.v.02: plan, organize, and carry out (an event) stag.v.01: attend a dance or a party without a female companion denounce.v.04: give away information about somebody spy.v.02: watch, observe, or inquire secretly [nltk_data] Downloading package wordnet to /root/nltk_data... [nltk_data] Package wordnet is already up-to-date!
In spacy we get correct output: import spacy
Load the English model
nlp = spacy.load('en_core_web_sm')
Process the text
doc = nlp("staging")
Get the lemma
lemma = doc[0].lemma_ print(lemma) # Should output 'stage'
Output: stage
The present participle of to stag is stagging, so the problem is in the morphological processing, and not in the database. Other lemmatizers are available:
from nltk.stem import WordNetLemmatizer as wnl
print(wnl().morphy("staging", pos="v"))
stage
None of the WordNetLemmatizer functions is able to lemmatize stagging:
from nltk.stem import WordNetLemmatizer as wnl
print(wnl().lemmatize("stagging", pos="v"))
stagging
print(wnl().morphy("stagging", pos="v"))
None
print(wnl()._morphy("stagging", pos="v"))
[]
Here, the solution would be to add a line in the exceptions file verb.exc, in analogy with tagging -> tag.
Hi! I will take care of this. Could you assign this issue to me? @ravishankar-cloud
Thanks @pedroborgescruz! If you plan to modify the database, it would be a data issue rather than an nltk issue. Then you should consider submitting your solution to the relevant Wordnet project instead.