elasticlunr-rs
elasticlunr-rs copied to clipboard
Faster English Porter's Stemmer
This gets rid of the extremely slow regex based implementation & introduces an all new implementation taken from rust-stem. I had to make a few changes to make it compatible with elasticlunr.
How fast is it exactly?
Old:
create_index time: [3.4562 ms 3.4664 ms 3.4838 ms]
New:
create_index time: [509.29 µs 511.17 µs 513.40 µs]
change: [-85.031% -84.680% -84.282%] (p = 0.00 < 0.05)
Performance has improved.
All tests are passing without any changes to them or the outputs so this is 100% compatible. I bet this can be further improved but it's a good start.
I remember working on something similar a long time ago but came to the conclusion that it was already fast enough. Would anybody be bothered if it took 100ms to build the search index? I could be wrong.
We'd want to at least add the license header from rust-stem if we went this route.
Would anybody be bothered if it took 100ms to build the search index
No harm in having better performance if it's possible without any compromises. The stemmer was one main bottleneck which this PR aims to remove. Having better performance means you can do other more important stuff in the same amount of time. A few ms here and there quickly add up to create a very clunky & slow UX.
If we talk especially about index building then it doesn't matter if you make the Index only once but if you are doing it every few seconds (e.g. in case of generating docs) then it quickly starts to make a difference in your productivity. If we had incremental indexing that'd help alot in that use case but since the index is built from scratch every single time, it needs to be as fast as possible.
We'd want to at least add the license header from rust-stem if we went this route.
Of course. I will make the necessary changes ASAP.
That would be nice for Zola as well
Sorry about the delay, this fell off my radar.