lz-string icon indicating copy to clipboard operation
lz-string copied to clipboard

micro-optimisation: use array lookup instead of charAt?

Open JobLeonard opened this issue 7 years ago • 8 comments

Hey, I was just looking through the base64-string code out of curiosity for how you did it, when I spotted that you use charAt and indexOf on a string: https://github.com/pieroxy/lz-string/blob/master/libs/base64-string.js#L178

My intuition tells me that this is slower than using an array of single character string, so I made these little tests to see if that is true:

http://jsbench.github.io/#cbfad9d07c549532aceb79f8b051553e

On my laptop, encoding via array lookup is slightly faster than using charAt on a string. Decoding is slower (I suspect that for larger strings the binary search hybrid migh have worked, but 64 characters is just so small a list that linear search + under-the-hood optimisations are faster).

There's also the possibility that this slightly reduces memory usage - I don't know if charAt creates a new string every time or not, but using the array method would imply pushing the same immutable string every time, right? (although even if there is a difference, it is probably insignificant).

JobLeonard avatar Mar 24 '17 13:03 JobLeonard

I just updated the tests with a dictionary decode as well, since that is what you use elsewhere too. It's almost 2.5x as fast on my desktop browsers, and in mobile Chrome but no Firefox (identified as "Android 52 browser" in the charts for some reason).

Again, I know it's a micro-op in a larger context so I totally understand if you wouldn't consider this worth changing the code for, but it's nice to have some up-to-date tests at least, no? (assuming I made no mistakes in the code and the benchmarks are valid)

JobLeonard avatar Mar 24 '17 14:03 JobLeonard

It's interesting. Could you push a MR ?

pieroxy avatar Mar 24 '17 16:03 pieroxy

@JobLeonard, it'd be interesting to combine your approach with Map/Set as discussed in #47 (I'm using a locally modified version that's ~2 times faster overall). Also, this is definitely worth implementing if the speed gain is consistent: the reason I'm using a local mod with Map/Set is because vanilla code takes more than 500ms to compress a moderately complex html page (the most popular questions on stackoverflow.com with 10-40 answers) on a very fast pc (i7 4GHz).

tophf avatar Mar 24 '17 21:03 tophf

@pieroxy Sure, I'll make a branch and try.

(Aside, anyone else annoyed by how both jsperf and jsbench repaint the DOM while doing pure-JS benchmarks? I had it lag on a few tests because of DOM reflow calculations, that surely skewed the numbers)

JobLeonard avatar Mar 26 '17 11:03 JobLeonard

Ok, before I do this I have some more testing to do.

You see, it occurred to me that we can also try pairs of characters. In that case, both the dictionary and the binary search options should end up significantly faster, and perhaps faster than the single-char options too.

A pair of chars would require 64*64 = 4096 entries in our array or dictionary, which at two bytes per char and two chars per pair ends up at 16Kib for the array. I don't know what the typical dictionary overhead is.

We could even try 3 chars (262144 permutations) because surely most environments can spare 1536 KiB for a lookup-table these days. If we go up to 4 chars, that would require 16777216 entries of 4 chars, each 16 bits, so (64^4)*4*2 / (1024*1024) = 128 MiB. That's getting rather large, but since 4*6 = 3*8, it would have the nice property of fitting exactly three bytes at a time! That might simplify some bit-shifting code (alternatively, we go with the pairs, then encode/decode two pairs at a time, going through three bytes at a time.

Of course, in order for this to be even plausibly faster we'd need to be sure that splitting into substrings isn't so much slower than single chars that it's going to be a problem. Interestingly, it turns out that Chrome has a special optimisation for str.split('') (there's no other explanation I can think of). Splitting by segments is significantly slower. Surprisingly, on Firefox it's faster to split substrings than single chars!

http://jsbench.github.io/#9cb819bf1ce429575f8535a211f72d5a

http://jsbench.github.io/#e5169a605b641037d62ba4f8bbb8f1d8

http://jsbench.github.io/#d70fc60aeaed28f321133c4c840e6ad6

There's also this comment about typed arrays that I saw a while ago that suggests that using uint32 arrays and bitshifting is faster than uint8 arrays, so I'll look into that too:

https://news.ycombinator.com/item?id=13790675

JobLeonard avatar Mar 27 '17 09:03 JobLeonard

At home on sick leave, finally have some time to look into this again. And it's such a small change too...

Hope it actually improves performance a bit! I'm not sure how to test that properly...

JobLeonard avatar Jun 25 '17 15:06 JobLeonard

Ping

jimmykane avatar May 28 '18 21:05 jimmykane

This is going into the next release (which may be a beta / release-candidate).

Rycochet avatar Jun 29 '18 07:06 Rycochet