elasticlunr-rs icon indicating copy to clipboard operation
elasticlunr-rs copied to clipboard

Faster English Porter's Stemmer

Open thecodrr opened this issue 2 years ago • 3 comments

This gets rid of the extremely slow regex based implementation & introduces an all new implementation taken from rust-stem. I had to make a few changes to make it compatible with elasticlunr.

How fast is it exactly?

Old:

create_index            time:   [3.4562 ms 3.4664 ms 3.4838 ms]

New:

create_index            time:   [509.29 µs 511.17 µs 513.40 µs]                         
                        change: [-85.031% -84.680% -84.282%] (p = 0.00 < 0.05)
                        Performance has improved.

All tests are passing without any changes to them or the outputs so this is 100% compatible. I bet this can be further improved but it's a good start.

thecodrr avatar Oct 21 '22 10:10 thecodrr

I remember working on something similar a long time ago but came to the conclusion that it was already fast enough. Would anybody be bothered if it took 100ms to build the search index? I could be wrong.

We'd want to at least add the license header from rust-stem if we went this route.

mattico avatar Nov 10 '22 17:11 mattico

Would anybody be bothered if it took 100ms to build the search index

No harm in having better performance if it's possible without any compromises. The stemmer was one main bottleneck which this PR aims to remove. Having better performance means you can do other more important stuff in the same amount of time. A few ms here and there quickly add up to create a very clunky & slow UX.

If we talk especially about index building then it doesn't matter if you make the Index only once but if you are doing it every few seconds (e.g. in case of generating docs) then it quickly starts to make a difference in your productivity. If we had incremental indexing that'd help alot in that use case but since the index is built from scratch every single time, it needs to be as fast as possible.

We'd want to at least add the license header from rust-stem if we went this route.

Of course. I will make the necessary changes ASAP.

thecodrr avatar Nov 11 '22 06:11 thecodrr

That would be nice for Zola as well

Keats avatar Mar 24 '23 07:03 Keats

Sorry about the delay, this fell off my radar.

mattico avatar Jun 15 '24 17:06 mattico