spaCy
spaCy copied to clipboard
"whitelisting" lemmatised as "whiteliste"
As it says in the title, the lemmatisation of "whitelisting" is wrong.
How to reproduce the behaviour
>>> import spacy
>>> nlp = spacy.load("en_core_web_sm")
>>> sent = nlp("I am whitelisting this word.")
>>> for tok in sent:
... print(tok.text, tok.lemma_, tok.pos_)
...
I I PRON
am be AUX
whitelisting whiteliste VERB
this this DET
word word NOUN
. . PUNCT
Your Environment
- spaCy version: 3.8.7
- Platform: Linux-6.8.0-1029-gcp-x86_64-with-glibc2.35
- Python version: 3.12.11
- Pipelines: en_core_web_sm (3.8.0)
Similar lemmatisation issue (spaCy version: 3.8.8): "I am riding on my blue bike.": riding -> lemma: "rid" (expected: "ride")
But probably belongs to this master thread: https://github.com/explosion/spaCy/issues/3052