course-nlp icon indicating copy to clipboard operation
course-nlp copied to clipboard

SpaCy Lemmatizer use in Lesson 2

Open EdwardJRoss opened this issue 5 years ago • 5 comments

In 2-svd-nmf-topic-modeling.ipynb under the section Spacy you use:

from spacy.lemmatizer import Lemmatizer
lemmatizer = Lemmatizer()
[lemmatizer.lookup(word) for word in word_list]

Unfortunately this creates an empty lemmatizer that will just always return what's input, and may give the wrong impression.

Instead you should use something like:

nlp = spacy.load("en_core_web_sm")
lemmatizer = nlp.Defaults.create_lemmatizer()
[lemmatizer.lookup(word) for word in word_list]

Also the command to download the English model at the start of this section is written as: spacy -m download en_core_web_sm when it should either be python -m spacy download en_core_web_sm or spacy download en_core_web_sm

Thanks

EdwardJRoss avatar Jul 12 '19 03:07 EdwardJRoss

thanks for the assistance

elishadammie avatar Jan 15 '20 16:01 elishadammie

I have issues with lecture 2.

nlp = spacy.load("en")

it returns an error

elishadammie avatar Jan 15 '20 16:01 elishadammie

In Spacy version 2.2.3, the parameterless constructor is removed. The code now shows error:

TypeError: init() missing 1 required positional argument: 'lookups'

Thanks for the workaround @EdwardJRoss

gorgi avatar Feb 02 '20 19:02 gorgi

change in spacy v3 https://spacy.io/usage/linguistic-features#lemmatization as in the discussion below: https://github.com/explosion/spaCy/discussions/9235

If you install the package spacy-lookups-data, you can replace the rule-based lemmatizer with a lookup lemmatizer.

pip install spacy[lookups]

this is a workaround for spacy v3

nlp = spacy.blank('en')
nlp.add_pipe('lemmatizer', config={'mode':'lookup'}).initialize()
[nlp(word)[0].lemma_ for word in word_list]

huum4n avatar Oct 01 '21 21:10 huum4n

This seems to not immediately fail: https://issueexplorer.com/issue/fastai/course-nlp/3

juliewu-md avatar Nov 13 '21 03:11 juliewu-md