levenshtein-sse icon indicating copy to clipboard operation
levenshtein-sse copied to clipboard

Potential performance improvement

Open NickStrupat opened this issue 6 years ago • 1 comments

Hi! Thanks for the awesome work here. I've been tinkering with it for the past couple of evenings.

I've been testing for potential performance improvements by switching from std::vector & your allocator to using blocks of memory allocated on the stack (via alloca). I'm seeing a 25%-ish improvement for short strings, but about 25%-ish degradation for longer strings. I'm wondering if you have any ideas why that might be?

Also, I haven't yet done a proper multi-thread test. I suspect the contention on _mm_malloc will yield a higher performance improvement when using alloca, since the threads don't need to sync on heap allcoation. If I get some time next week I will try to do that and I'd love to hear what you think of this approach.

The changes I made to achieve the stack allocation are here... https://github.com/NickStrupat/levenshtein-sse/commit/58ee9601942e286f15471a0c6f0f5eda366cdd5f

NickStrupat avatar Apr 13 '18 00:04 NickStrupat

I like the idea, but switching to alloca unconditionally might not be ideal, because it’s completely possible to bring things to a stack overflow using a single call that way…

What we do in Node.js core a lot is using a fixed-size stack-allocated buffer for small strings, and deferring to a larger, heap-allocated buffer for longer ones.

Also, this library is generally better at handling long strings – for short ones, vectorization doesn’t really pay out, and there’s the additional overhead of JS → C++ calls.

addaleax avatar Apr 14 '18 16:04 addaleax