valhalla
valhalla copied to clipboard
Even Faster Edge Candidate Search for Loki and Meili
Long ago in a land far away, we wrote the code which "bins" all the edges in the graph into a grid that is finer than the smallest tile that we support. These bins are .05x.05 degrees in size which means at the equator (ie worst case) a bin covers 5566m in width/height.
Because of this coarseness, meili, the map matching component, creates a secondary cache at 100mx100m grid on the fly and must deal with purging this etc. It does this because even though the algorithm in loki is fast for looking up coordinates, it is not fast enough for mapmatching with a large number of points. We've done experiments on letting loki do this work for meili and the performance difference were roughly an order of magnitude (after the secondary cache was warmed).
This is annoying because why would we want to maintain another cache and another code path for doing the same thing, right?
Recently I was talking to @merkispavel about how the bins work, what they do and how they are used by loki/meili. We were specifically talking about how the sorting of the bins affects the performance of the algorithms that iterate over them (https://github.com/valhalla/valhalla/pull/2515#discussion_r466685873). As I was explaining this to him i mentioned that in addition to the sorting proposed here, potentially other sorting could make sense, specifically if we sort the bins in a spatial way at a smaller resolution. this isnt really possible because there is no way to tell the boundaries (where one partition ends and another begins) in the sorted list of IDs but a different idea hit me..
The GraphIDs have 18bits spare. Previously I had considered storing reachability information in these bits but precomputing that is rather difficult with the ability to change the accessabilty of a given mode of transport on the fly.. So what if we used the extra bits instead as a mask to tell what subregions of the bin a given edge crosses?
Since we have 18bits spare, we could further chop the .05x.05 degree bin into 16 smaller pieces (4x4). Each cell in this sub-bin would have a bit that we could flip in the mask on the extra 16 bits of every edges edge id. The algorithm would work similar to the way it does now, but instead of bins we would want to iterate over sub bins in closest first order. Technically this change would be backwards compatible because the old code wouldnt care about the spare bits being used and we could make new code fall back to the old algorithm when it detects that the sub bins arent set (ie the spare bits are set to 0).
Doing this would reduce the maximum width/height from 5566m down to 1391m at the equator (worst case). Its still not 100m, but 16x less area covered. This should speed up loki::Search and hopefully make it fast enough to use in place of the secondary finer cache we have in meili::candidate_search