polyline Refactor/encode decode via lut

Okay, here we go. This should hopefully finally fix #39 and #37 as well as #35.

I have taken some inspiration from flexpolyline, I think the performance improvement on the encoder side comes mainly from computing the scaled longitude/latidude only once for each new coordinate pair - compare the loop in encode_coordinates(). When we subtract previous from current, previous is already scaled.

EDIT: Seeing the performance gains made in #42 I'd rather attribute the improved encoder performance to removing the string concatenation and directly pusing to the result buffer.

I have changed the code to use the LUT implementation when it was faster on my local machine (M2 Pro). I choose to leave the respective alternative in the code. I encourage you to test on your system as well, maybe you report differently.

The main improvement regarding #39 and #37 comes from explicitly computing and comparing against the maximum shift that would still provide a valid longitude/latitude coordinate.

Timings compared to latest main:

     Running benches/benchmarks.rs (target/release/deps/benchmarks-f896454355081916)
bench encode: 1000 coordinates
                        time:   [10.713 µs 10.744 µs 10.777 µs]
                        change: [-83.109% -83.025% -82.947%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

bench decode: 21502 coordinates
                        time:   [277.77 µs 280.55 µs 283.09 µs]
                        change: [+51.144% +52.512% +53.845%] (p = 0.00 < 0.05)
                        Performance has regressed.

bench polyline6 decoding
                        time:   [54.661 ns 54.814 ns 54.986 ns]
                        change: [+35.941% +36.468% +36.938%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild

bench HUGE polyline6 decoding
                        time:   [58.936 µs 59.531 µs 60.308 µs]
                        change: [+96.872% +99.934% +103.05%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe

I don't quite know what to say about decoder performance. The conditionals sure are responsible for the performance hit, but it's still rather quick and not as abysmal as other solutions - I assume we'd have to sacrifice a bit of performance for robustness here?

Apr 29 '24 08:04 mattiZed

This is really interesting and I think worth pursuing.

My understanding is that you've done at least 2 big things to make things faster:

avoided allocating a bunch of strings (like I redundantly did in #42)
implemented a lookup table

My #42 does only the first part, and isn't quite as fast. But it's still a big boost: -70%-75%, and seemingly without any of the related slowdowns, so it feels not at all controversial.

How would you feel about starting by merging #42, or you (@mattiZed) could pull the string allocation part out of this PR into a separate PR, I'd be happy to have your name on the commits if you'd prefer that - especially since you wrote it first. 🙃

My motivation is that, it might be illuminating to evaluate what exactly we're getting from the LUT without muddying the waters, since seemingly most of the speedup is from avoiding all the string allocations.

May 01 '24 15:05 michaelkirk

I agree 100%. i will remove string allocation improvement from this pr so it can be merged first. then merging your pr after should be straight forward. will do it tomorrow.

May 01 '24 17:05 mattiZed

I was thinking let's merge the string allocation improvement first, since it's strictly an improvement.

May 01 '24 19:05 michaelkirk

Either is fine for me, will update this PR now.

May 02 '24 05:05 mattiZed

And here are the new timings:

Most of the encoding gain is/was from pushing to the same buffer all around, and will be retrieved again from #42.

     Running benches/benchmarks.rs (target/release/deps/benchmarks-f896454355081916)
bench encode: 1000 coordinates
                        time:   [55.520 µs 55.782 µs 56.073 µs]
                        change: [-12.412% -12.012% -11.609%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

bench decode: 21502 coordinates
                        time:   [280.85 µs 282.70 µs 284.43 µs]
                        change: [+56.166% +57.259% +58.311%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild

bench polyline6 decoding
                        time:   [53.460 ns 53.707 ns 53.999 ns]
                        change: [+34.154% +34.893% +35.661%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

bench HUGE polyline6 decoding
                        time:   [65.463 µs 66.191 µs 67.074 µs]
                        change: [+135.64% +138.94% +142.48%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

May 02 '24 05:05 mattiZed

Would you mind rebasing and running against the new benchmarks now that #42 is merged? I think/hope they'll be a little more stable.

May 06 '24 23:05 michaelkirk

Hi Mike, im on vacation for this and next week and will continue work on this after. However, you could also take over if you'd like/it's urgent in any sense.

Best, Stefan

May 07 '24 16:05 mattiZed

Is this LUT implementation something you're still interested in merging @mattiZed?

Since we've, in the meanwhile merged a subset of the changes, the remaining work was some conflicts that need to be addressed.

Jan 08 '25 20:01 michaelkirk

Hey @michaelkirk sorry for the late reply. I don't mind, if you'd like to see it merged, I can work on it next week.

Jan 24 '25 16:01 mattiZed

I'd like to see it merged if it's significantly faster. Now that the other perf improvements have been merged separately, we should be able to see if the LUT itself is helpful.

Jan 24 '25 16:01 michaelkirk

Alright, I will prepare it next week.

Michael Kirk @.***> schrieb am Fr., 24. Jan. 2025, 17:45:

I'd like to see it merged if it's significantly faster. Now that the other perf improvements have been merged separately, we should be able to see if the LUT itself is helpful.

— Reply to this email directly, view it on GitHub https://github.com/georust/polyline/pull/40#issuecomment-2612961372, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3AQOK7YXHHNS3W4AUDTXL2MJU2VAVCNFSM6AAAAABU2VXANCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJSHE3DCMZXGI . You are receiving this because you were mentioned.Message ID: @.***>

Jan 24 '25 17:01 mattiZed

Will finally dedicate some time to this PR next week!

Apr 04 '25 15:04 mattiZed