spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

"whitelisting" lemmatised as "whiteliste"

Open chrisjbryant opened this issue 5 months ago • 1 comments

As it says in the title, the lemmatisation of "whitelisting" is wrong.

How to reproduce the behaviour

>>> import spacy
>>> nlp = spacy.load("en_core_web_sm")
>>> sent = nlp("I am whitelisting this word.")
>>> for tok in sent:
...     print(tok.text, tok.lemma_, tok.pos_)
... 
I I PRON
am be AUX
whitelisting whiteliste VERB
this this DET
word word NOUN
. . PUNCT

Your Environment

  • spaCy version: 3.8.7
  • Platform: Linux-6.8.0-1029-gcp-x86_64-with-glibc2.35
  • Python version: 3.12.11
  • Pipelines: en_core_web_sm (3.8.0)

chrisjbryant avatar Aug 06 '25 12:08 chrisjbryant

Similar lemmatisation issue (spaCy version: 3.8.8): "I am riding on my blue bike.": riding -> lemma: "rid" (expected: "ride")

But probably belongs to this master thread: https://github.com/explosion/spaCy/issues/3052

cyriaka90 avatar Nov 11 '25 09:11 cyriaka90