course-nlp
course-nlp copied to clipboard
SpaCy Lemmatizer use in Lesson 2
In 2-svd-nmf-topic-modeling.ipynb
under the section Spacy
you use:
from spacy.lemmatizer import Lemmatizer
lemmatizer = Lemmatizer()
[lemmatizer.lookup(word) for word in word_list]
Unfortunately this creates an empty lemmatizer that will just always return what's input, and may give the wrong impression.
Instead you should use something like:
nlp = spacy.load("en_core_web_sm")
lemmatizer = nlp.Defaults.create_lemmatizer()
[lemmatizer.lookup(word) for word in word_list]
Also the command to download the English model at the start of this section is written as:
spacy -m download en_core_web_sm
when it should either be python -m spacy download en_core_web_sm
or spacy download en_core_web_sm
Thanks
thanks for the assistance
I have issues with lecture 2.
nlp = spacy.load("en")
it returns an error
In Spacy version 2.2.3, the parameterless constructor is removed. The code now shows error:
TypeError: init() missing 1 required positional argument: 'lookups'
Thanks for the workaround @EdwardJRoss
change in spacy v3 https://spacy.io/usage/linguistic-features#lemmatization as in the discussion below: https://github.com/explosion/spaCy/discussions/9235
If you install the package spacy-lookups-data, you can replace the rule-based lemmatizer with a lookup lemmatizer.
pip install spacy[lookups]
this is a workaround for spacy v3
nlp = spacy.blank('en')
nlp.add_pipe('lemmatizer', config={'mode':'lookup'}).initialize()
[nlp(word)[0].lemma_ for word in word_list]
This seems to not immediately fail: https://issueexplorer.com/issue/fastai/course-nlp/3