garnet icon indicating copy to clipboard operation
garnet copied to clipboard

Make ``GeoHash`` coordinate conversions ~40-100x faster

Open PaulusParssinen opened this issue 1 year ago • 9 comments

[!NOTE] Credits for the quantization approach to https://mmcloughlin.com/posts/geohash-assembly (by @mmcloughlin) and de-quantization method to https://github.com/georust/geohash, big thank you for everyone contributing to both efforts!

This PR originally was meant to be simply tiny optimization to avoid unnecessary heap allocations on the coordinate conversions but I had change of plans when I discovered an article about much faster (de)quantization method by Michael McLoughlin and this ended up being GeoHash class rewrite. Please see his great article and explanation.

Much of the commits after https://github.com/microsoft/garnet/pull/348/commits/564891a01678cd8b7b6a4772e10f37d838e076ba is me discovering new exciting details about IEEE-754 floating-point rounding, confusing reference implementations (it turns out, there's no single correct way to reference a location on earth 😅) and other fun Geohash corner-cases.

However, more notably there's tiny error in the quantization approach described in the McLoughing's excellent article which I describe in this comment and fix in f364a96c..b4ec75c. Gist of it is that in IEEE-754 double precision format, the exponent assumption of 1023 does not hold for 2.0, which the maximum coordinate values get rounded to.

There should be no other behavioral differences than the last character of textual representation, which one of the existing tests already ignored understandably due to nuances around the precision of it. The textual representation ~~is~~ should simply be base-32 encoded (custom geohash alphabet) of the integer that represents a specific GeoHash where more characters means more bits of precision. Every encoded character gives 5-bits of precision. Very simple right! I wish it was

Garnet (and Redis) chose 52-bit precision. Why? Because we want to store the geohash integer in a sorted set, which allows us to utilize the geohash structure and do really efficient range queries. Now because we want to reuse the existing sorted set we need to safely store the geohash integer in IEEE-754 double precision score which we can do that safely when our integer is in range [(-2)^53. 2^53].

Now, we can still encode this 52-bit precision integer in the base-32 alphabet just fine. The remaining 55-52=3 bits will be zero and that's what Garnet has done, but Redis seems to zero out the last 2 bits and output the last character of the textual representation as 0.

Redis made another choice too, which is that they chose to follow EPSG:3875 where:

  • Valid longitudes are from -180 to 180 degrees.
  • Valid latitudes are from -85.05112878 to 85.05112878 degrees.

This makes sense as the poles cause all sorts of problems in GIS applications.

Confusing history behind Web Mercator

image

These limits however do not apply to "standard" geohash format which valid latitudes are [-90, 90]. This difference also adds nuance to geohash implementations. Garnet currently accepts coordinates per the geohash limitations.

This PR also introduces AsciiUtils in Garnet.common as place to share common ASCII manipulation logic (acts more like polyfill for Ascii class in .NET 6 and a place where we can optimize further than what BCL's currently does).

This PR is strictly focused on GeoHash class alone. I'll remove more unnecessary allocations in the actual RESP protocol parsing & encoding in follow-up. There's some vectorization opportunities here and the distance math can be definitely improved too.

Benchmarks (Tiger Lake, i5-1135G7)

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
11th Gen Intel Core i5-1135G7 2.40GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 8.0.204
  [Host] : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  .NET 6 : .NET 6.0.29 (6.0.2924.17105), X64 RyuJIT AVX2
  .NET 8 : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Server=True  

main

Method Job Mean Error StdDev Gen0 Allocated
GeoToLongValue .NET 6 278.3 ns 0.51 ns 0.43 ns 0.0191 80 B
GeoToLongValue .NET 8 260.0 ns 0.85 ns 0.80 ns 0.0191 80 B
GetCoordinatesFromLong .NET 6 429.1 ns 1.30 ns 1.22 ns 0.0801 336 B
GetCoordinatesFromLong .NET 8 213.8 ns 0.85 ns 0.75 ns 0.0801 336 B
GetGeoHashCode .NET 6 810.1 ns 4.79 ns 4.24 ns 0.2537 1064 B
GetGeoHashCode .NET 8 619.1 ns 1.88 ns 1.76 ns 0.2537 1064 B

PR

Method Job Mean Error StdDev Gen0 Allocated
GeoToLongValue .NET 6 5.556 ns 0.0156 ns 0.0146 ns - -
GeoToLongValue .NET 8 1.888 ns 0.0156 ns 0.0130 ns - -
GetCoordinatesFromLong .NET 6 5.074 ns 0.0038 ns 0.0031 ns - -
GetCoordinatesFromLong .NET 8 1.215 ns 0.0038 ns 0.0034 ns - -
GetGeoHashCode .NET 6 15.944 ns 0.2487 ns 0.2077 ns 0.0014 48 B
GetGeoHashCode .NET 8 15.933 ns 0.0331 ns 0.0294 ns 0.0012 48 B

Benchmarks (Zen 2, Ryzen 7 3700X)

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4170/22H2/2022Update)
AMD Ryzen 7 3700X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-preview.3.24204.13
  [Host] : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX2
  .NET 6 : .NET 6.0.27 (6.0.2724.6912), X64 RyuJIT AVX2
  .NET 8 : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX2

Server=True

main

Method Job Mean Error StdDev Gen0 Allocated
GeoToLongValue .NET 6 290.1 ns 3.08 ns 2.88 ns 0.0095 80 B
GeoToLongValue .NET 8 269.4 ns 2.14 ns 1.90 ns 0.0095 80 B
GetCoordinatesFromLong .NET 6 510.0 ns 9.71 ns 9.54 ns 0.0401 336 B
GetCoordinatesFromLong .NET 8 195.3 ns 3.94 ns 3.49 ns 0.0401 336 B
GetGeoHashCode .NET 6 877.3 ns 6.87 ns 6.42 ns 0.1268 1064 B
GetGeoHashCode .NET 8 525.6 ns 6.46 ns 6.04 ns 0.0982 824 B

PR

Method Job Mean Error StdDev Gen0 Allocated
GeoToLongValue .NET 6 5.646 ns 0.0306 ns 0.0255 ns - -
GeoToLongValue .NET 8 5.829 ns 0.0537 ns 0.0502 ns - -
GetCoordinatesFromLong .NET 6 5.507 ns 0.0583 ns 0.0545 ns - -
GetCoordinatesFromLong .NET 8 5.928 ns 0.0423 ns 0.0396 ns - -
GetGeoHashCode .NET 6 16.286 ns 0.0968 ns 0.0808 ns 0.0003 48 B
GetGeoHashCode .NET 8 15.721 ns 0.1790 ns 0.1587 ns 0.0003 48 B
Zen 2 PDEP/PEXT "erratum" in action
Method Job Mean Error StdDev Allocated
GeoToLongValue .NET 6 w/ UsePdepPext 64.98 ns 0.132 ns 0.103 ns -
GeoToLongValue .NET 8 w/ UsePdepPext 64.95 ns 0.383 ns 0.359 ns -
GetCoordinatesFromLong .NET 6 w/ UsePdepPext 62.87 ns 0.137 ns 0.128 ns -
GetCoordinatesFromLong .NET 8 w/ UsePdepPext 62.92 ns 0.090 ns 0.084 ns -

See that 48B heap allocation for the final string in GetGeoHashCode, we don't need that :)

PaulusParssinen avatar Apr 30 '24 21:04 PaulusParssinen

CanEncodeAndDecodeCoordinates

All values computed values printed with ToString("F16") (except the updated row, which is F14..)

Label Latitude Longitude Lat Error (vs. input) Lon Error (vs. input)
Test input 30.5388942218 104.0555758833 - -
Test IEEE-754 repr. 30.5388942217999997 104.0555758832999942 - -
main 30.5388928949832916 104.0555772185325623 0.0000013268167081 -0.0000013352325681
PR (https://github.com/microsoft/garnet/pull/348/commits/1b59a0a4b790ad29162e21ab4b52dd082625ace1) 30.5388915538787842 104.0555745363235474 0.0000026679212155 0.0000013469764468
PR (calculate center) 30.538892894983292 104.05557721853256 0.0000013268167081 -0.0000013352325681

edit: and I finally realize, the de-quantized coordinate represents the minimum value and we get the same coordinate as main branch which represented the center of this error-bounding-box by: $(\text{{minLatitude}} + \frac{{\text{{latitudeError}}}}{2.0}, \text{{minLongitude}} + \frac{{\text{{longitudeError}}}}{2.0})$

Funny in hindsight as I was earlier wondering why those error values were multiplies of each other..

PaulusParssinen avatar May 02 '24 00:05 PaulusParssinen

Premature optimization is ~~root of all evil~~ fun!

Few test failures left, need to dive deeper into some rounding corner-cases.

The diff looks incomprehensible to review, it's pretty much a rewrite 😅

PaulusParssinen avatar May 02 '24 02:05 PaulusParssinen

Added USE_PDEP_PEXT to allow those consumers who want to extract every bit of perf. for their Geo commands (and who know their CPU has the non-microcode-emulated version). SharpLab (Entire encoding/decoding is inlined into one method 😋)

PaulusParssinen avatar May 03 '24 00:05 PaulusParssinen

adds "Investigate LightClient implementation" to todo list. It seems run some tests until timeout if it receives the expected amount of bytes

For some reason, I remembered/thought that the hash integer was bit-cast i.e. BitConverter.DoubleToUInt64Bits to the sorted set instead of just cast. That explains the 52-bits and I'll cancel the precision change plans..

I just realized the original implementation was not actually redis "52-bit" precision either 🤔 Redis seems to always "clear" out the last 2 bits for the textual representation output, so it always returns 0 as last character. Before this PR, there was already difference between the textual representation between Garnet and Redis.

PaulusParssinen avatar May 04 '24 20:05 PaulusParssinen

There's small corner-case error in the McLoughin's quantization trick on which this PR is based.

The statement

Since y is in the range [1,2], the largest power of two less than or equal to y is 1.0 and the exponent field will always be 1023.

is wrong for the upper-bound of the range. Exponent bias is 1023 for [1.0, 2.0) and exponent for 2.0 flips over to 1024. This means that 90.0 and 180.0 that are clamped to the range maximum 2.0, have their signicand (and subsequently, the geohash integer repesentation) zeroed and dequantization will give equivalent output as -90.0 and -180.0 respectively.

There's a lot of problems with GeoHash in general around the corner values where implementations differ..

Humor

GCQ2uv5bwAEQgju

PaulusParssinen avatar May 06 '24 14:05 PaulusParssinen

And to add to then confusion, the original latitude limits are "wrong". According EPSG:3857 (also known as EPSG:3785, also known as EPSG:900913.. etc.) the latitude limits are -85.05112878 to 85.05112878.. not -90 to 90. Fun! I'm starting to see why the implementation are all over the place as the GeoHash standard uses different limits than the most common web projection.

I'll won't touch the existing latitude limit in this PR.

PaulusParssinen avatar May 07 '24 21:05 PaulusParssinen

Interesting stuff! Glad to see you are making progress here. :)

badrishc avatar May 08 '24 00:05 badrishc

Added benchmarks from my laptop which has AVX512F by which we guard the PEXT/PDEP path. This made both encoding/decoding of geohashes integers go from ~5.5ns to ~1.5ns which was more than I expected. It may be worthwhile to enable more platforms with static readonly Zen 2 check by doing X86Base.CpuId(0, 0) and X86Base.CpuId(1, 0) and check if AuthenticAMD && family < 0x19 to guard Pre-Zen 3.

Also wrote little bit more about the PR to its description.

PaulusParssinen avatar May 08 '24 20:05 PaulusParssinen

Added more background to the PR description about the nuances of geohash..

PaulusParssinen avatar May 08 '24 23:05 PaulusParssinen

Out of curiosity, what required a force-push? 😅 (This broke the history experience in GitHub UI. I have still this branch in it's pre-force-push sync state if we want to restore that.)

PaulusParssinen avatar May 21 '24 10:05 PaulusParssinen

Out of curiosity, what required a force-push? 😅 (This broke the history experience in GitHub UI. I have still this branch in it's pre-force-push sync state if we want to restore that.)

No changes, actually. I was updating the branch with rebase option to bring in latest changes from main.

yrajas avatar May 21 '24 15:05 yrajas