maplibre-gl-js icon indicating copy to clipboard operation
maplibre-gl-js copied to clipboard

Partial updates for GeoJSON sources

Open westinrm opened this issue 3 years ago • 9 comments

Motivation

When handling medium to high scale animated GeoJSON data in Mapbox there are quite a few bottlenecks.

  • Sending the entire GeoJSON string from the main thread to the web worker on every update, no matter how small
  • Reprojecting, simplifying, and building vector tiles for every feature even when only a subset changes
  • Invaliding and re-rendering all vector tiles on the main thread when only a subset of tiles change

A good example of this data is https://opensky-network.org/ (Live airplane tracking)

I've internally developed something like this before and saw improved performance just from making the web worker <--> main thread invalidations even without pushing partial updates through geojson-vt and the rest of the GeoJSON worker.

Design Alternatives

  • Do nothing in Maplibre and require API consumers that have medium/high scale animation requirements to write a custom layer. The advantage of this is that a custom layer can be specialized in such a way that it can theoretically perform better. The main disadvantage is such layers are unlikely to be well integrated with the styling language, and that for API consumers it's far more complex to write a high performance custom layer than it is to consume an API used for one.
  • Create an entirely new mapbox source implementation built from the ground up handle high scale data, with an API that uses a more packed data format than GeoJSON. While this has a higher performance potential than building on top of the existing GeoJSON source it also has an extremely high upfront cost and is likely more difficult for users who already use the GeoJSON source to migrate over to.

Design

I propose we add an API for partially updating the contents of GeoJSON source. Initially these partial updates would only be used to reduce the amount of data sent from the main thread to the web worker and to avoid redrawing tiles that haven't changed.

While not immediately required introducing a partial API also opens up the possibility for future performance improvements such as only re-running quantization and reprojection when the feature changes, only re-encoding vector tile properties that change, etc.

Mock-Up

I propose adding the following method to GeoJsonSource:

    type GeoJsonFeatureId = number | string;
    interface GeoJsonSourceDiff {
        removeAll?: boolean,
        removed?: Array<GeoJsonFeatureId>,
        add?: Array<GeoJsonFeature & {id: GeoJsonFeatureId}>,
        update?: Array<GeoJsonFeatureDiff>,
    }
    interface GeoJsonFeatureDiff {
        id: GeoJsonFeatureId,
        newGeometry?: GeoJsonGeometry,
        removeAllProperties?: boolean,
        removeProperties?: Array<string>,
        addOrUpdateProperties?: Array<{key: string, value: any}>,
    }
    /**
     * Performs a partial update of per-feature data stored in this GeoJSON source.
     */
    updateData(diff: GeoJsonSourceDiff);

Concepts

Not entirely sure what would be relevant for this section for this feature request / enhancement.

Implementation

An initial implementation of this JS would look like:

  1. User calls geoJsonSource.updateData
  2. The diff is applied to the main thread's copy of the feature state
  3. The diff is serialized as JSON and sent to the web worker
  4. The web worker marks the prior location of updated/removed features and the new location of updated/added features as dirty. The set of changed can be stored in a limited depth quad tree rolling up homogenous branches to keep this data structure small.
  5. The web worker applies the diff to the worker's feature state and runs geojsonvt on it. In the future this could be improved to avoid rebuild the geojsonvt structure.
  6. The web worker sends the invalidation structure generated in (4) back to the main thread.
  7. The main thread invalidates any loaded tiles that intersect the invalidation structure and re-renders them.

In C++ this code looks a lot different since there is no web worker. In this case the diff just has to be applied on the main thread and the modified tiles invalidated. No serialization required.

One edge case that I'm unsure of how to handle is what the behavior should be if setData is used in addition to updateData. setData accepts multiple features with the same ID, and it accepts features without an ID at all. Both of these make it impossible for setData to just call updateData under the hood. I can think of three ways to address this:

  1. Prohibit the use of both APIs at the same time on a single source. If setData is used, and then updateData, then updateData throws.
  2. Both APIs are backed by separate data structures and the exposed features are the union of both data structures.
  3. Features with a unique ID in setData can be updated using updateData, but all other features are stored in a separate data structure.

Slightly related to #96

westinrm avatar May 17 '22 21:05 westinrm

I'm willing to do the work to implement this (and some of the follow up improvements to push partial updates into geojsonvt) assuming it's something that belongs in this repository.

westinrm avatar May 17 '22 21:05 westinrm

Interesting, thanks! @JannikGM you also do animations, right?

wipfli avatar May 18 '22 19:05 wipfli

I think this is an interesting direction. Few things to consider:

  1. Allow defining the mapping from ID to feature, this way in theory you can define the ID to be various properties or even a hashing function of some sort.
  2. geojsonvt is not maintained by this community and it is still a mapbox library so pushing improvements there might be harder (we can fork if needed obviously)

How large is the geojson you are talking about? I'm using a geojson with 20k features and the current code does it very fast...

HarelM avatar May 20 '22 07:05 HarelM

(1) Interesting thought. I think it might be a bit confusing since you'd need to always include the ID contributing properties on the updates so it can be mapped with the correct already existing feature. (2) You can actually do a lot of this without editing geojsonvt directly by extended the geojsonvt class and calling the methods directly. Since it's not public API it's arguable if this is a good idea over attempting to contribute upstream or forking.

We were looking at roughly 10k points and 10k 10 vertex lines (imagine 10k live point features with a short trail behind each one). Each feature updates roughly once every minute or so and you get an update message about once a second. These features have rather complex styling rules via rather complex properties. While it doesn't take that long to update (I'll confirm Monday, but I recall it taking ~10 seconds) it is prohibitively expensive to attempt to update every tile and feature for every update message.

westinrm avatar May 21 '22 07:05 westinrm

  1. You can always have a default ID in case no such method is provided (default function).
  2. I would avoid as much as possible relaying on internal fields and properties since these can break, but as a POC to see that this is a valid direction sure, why not.

HarelM avatar May 22 '22 08:05 HarelM

👋 Hi folks. So, this is something that I've sort of implemented but haven't shipped, related to Placemark. I have some notes based on that experience so far:

  • GeoJSONVT is fundamentally a top-down library that starts from the 0/0/0 tile. That makes it not a perfect solution for this kind of problem: if you're zoomed in, you might want to only generate the tile you're looking at rather than generate the tree from the top down.
  • At least GL uses flatbush as an index - not clear if maplibre-gl-js does? Anyway, it'd be important to, if the index is something static like flatbush, weigh whether an updatable index would be faster.

Regarding IDs and diffing:

  • In my opinion, IDs should be required and differential updates should be explicit - an API like map.updateFeature(id, newData) rather than a full-data call. This is because (a) diffs are themselves a potential bottleneck, and (b) fast diffs would use ids anyway.
  • And if you have a decent imperative API, you can build a diff library that generates the right calls for it.
  • An important consideration is ordering. The order of features in an initial featurecollection input is important, and diffs might change that order. Having an order key in the data, or a insertAfter parameter in the .updateFeature call seem like the most immediately possible ways to solve that.

tmcw avatar May 23 '22 16:05 tmcw

GeoJSONVT is fundamentally a top-down library

GeoJSONVT is 3 components. (1) reprojection, (2) indexing/clipping, and (3) tile encoding. The only part of this that is top down is (2) since its building a quadtree. (1) can easily be partially recomputed, (2) is a little trickier but still possible, and (3) is only run when a tile is requested. We would probably need to make major changes to allow partial updates, but I don't think the approach GeoJSONVT uses is crazy, even with partial updates.

At least GL uses flatbush as an index

Where GL is Mapbox GL JS? I don't think flatbush is necessarily going to be better than pre-clipped quadtree features. If this was arbitrary spatial queries sure, but when the quadtree is tile-aligned I don't think flatbush is notably better.

That said the main proposal here is to implement an API that allows partial updates of the source, and since the main performance issues I found with frequent updates were not related to GeoJSONVT (it was web worker RPC and executing the styles on vector tiles) I don't think figuring out exactly how partial updates would be wired through the the vector tile generation is required.

For IDs and diffing:

  • I'm not proposing that Mapbox would generate diffs, consumers of the API just have to provide their own diff. If they do that by generating a diff manually on the main thread or by pushing partial updates through their frontend application that's up to them. See the Mock-Up section in the initial comment for the proposed API.
  • Ordering can always be handled using the sort-key properties on the layer style.

westinrm avatar May 28 '22 00:05 westinrm

Related to #106 I think, somewhat. Has this been implemented and can be shared?

HarelM avatar Aug 21 '22 14:08 HarelM

The minimal amount of work here (new API + partial updates sent to the worker) would solve #106.

I unfortunately haven't had a chance to work on this recently, but will try to find some time sometime soon.

Equinox- avatar Aug 21 '22 16:08 Equinox-

Fixed by #1605

HarelM avatar Oct 01 '22 03:10 HarelM