fastembed
fastembed copied to clipboard
chore: Make PyStemmer optional
Pystemmer is a C library wrapper which enhances the speed of the tokenizer
I ran the following benchmark with and without it:
import time
from snowballstemmer import stemmer
s = stemmer('english')
text = "This stem form is often a word itself, but this is not always the case as this is not a requirement for text search systems, which are the intended field of use. We also aim to conflate words with the same meaning, rather than all words with a common linguistic root (so awe and awful don't have the same stem), and over-stemming is more problematic than under-stemming so we tend not to stem in cases that are hard to resolve. If you want to always reduce words to a root form and/or get a root form which is itself a word then Snowball's stemming algorithms likely aren't the right answer."
words = text.split()
loops = 1000
a = time.perf_counter()
for _ in range(loops):
for word in words:
stemmed = s.stemWord(word)
print(time.perf_counter() - a)
With pystemmer: 0.0221869999950286 Without pystemmer: 2.5555163340177387
The difference is noticeable, instead of dropping it, we can make it an optional dependency and allow to install it with pip install fastembed[pystemmer]
According to the users' reports, it crashes on windows during the installation, and not on the level of dependency resolution
@joein. Review please. Several people are running into this issue.
Hello, any update on this? Our team is running into issues with PyStemmer and we'd like the option for it to be optional as well.
Also would like to make this dependency optional,having issues with the building of PyStemmer.
Same issue here. WSL2. Python 3.12.5. Unable to install the package.
I've been installing version 0.2.7, which basically has the same dependencies as the newest version minus pystemmer. Then I install the new version with --no-deps, to avoid that package. It's been working fine, please remove that one requirement and make it an option!
Resolved with https://github.com/qdrant/fastembed/releases/tag/v0.4.1