fastembed icon indicating copy to clipboard operation
fastembed copied to clipboard

chore: Make PyStemmer optional

Open generall opened this issue 1 year ago • 6 comments

generall avatar Jul 23 '24 20:07 generall

Pystemmer is a C library wrapper which enhances the speed of the tokenizer

I ran the following benchmark with and without it:

import time
from snowballstemmer import stemmer

s = stemmer('english')
text = "This stem form is often a word itself, but this is not always the case as this is not a requirement for text search systems, which are the intended field of use. We also aim to conflate words with the same meaning, rather than all words with a common linguistic root (so awe and awful don't have the same stem), and over-stemming is more problematic than under-stemming so we tend not to stem in cases that are hard to resolve. If you want to always reduce words to a root form and/or get a root form which is itself a word then Snowball's stemming algorithms likely aren't the right answer."
words = text.split()

loops = 1000
a = time.perf_counter()
for _ in range(loops):
    for word in words:
        stemmed = s.stemWord(word)
print(time.perf_counter() - a)

With pystemmer: 0.0221869999950286 Without pystemmer: 2.5555163340177387

The difference is noticeable, instead of dropping it, we can make it an optional dependency and allow to install it with pip install fastembed[pystemmer] According to the users' reports, it crashes on windows during the installation, and not on the level of dependency resolution

joein avatar Jul 26 '24 11:07 joein

@joein. Review please. Several people are running into this issue.

Anush008 avatar Aug 07 '24 08:08 Anush008

Hello, any update on this? Our team is running into issues with PyStemmer and we'd like the option for it to be optional as well.

bendominguez0111 avatar Aug 16 '24 15:08 bendominguez0111

Also would like to make this dependency optional,having issues with the building of PyStemmer.

satyaloka93 avatar Aug 28 '24 18:08 satyaloka93

Same issue here. WSL2. Python 3.12.5. Unable to install the package.

sadaisystems avatar Aug 31 '24 17:08 sadaisystems

I've been installing version 0.2.7, which basically has the same dependencies as the newest version minus pystemmer. Then I install the new version with --no-deps, to avoid that package. It's been working fine, please remove that one requirement and make it an option!

satyaloka93 avatar Sep 01 '24 14:09 satyaloka93

Resolved with https://github.com/qdrant/fastembed/releases/tag/v0.4.1

Anush008 avatar Oct 22 '24 18:10 Anush008