shortbread-docs icon indicating copy to clipboard operation
shortbread-docs copied to clipboard

Decision on including/omiting `id` from MVT tiles

Open zbycz opened this issue 6 months ago • 24 comments

Hi,

i haven't found any mention of the id fields in the documentation.

Is there some consenus, if omiting id is intentional (perhaps for size benefits)?

Adding the osm_id and encoded osm_type is rather standard in OpenMapTIles stack and also in the planetiler. It has benefit of maplibre-gl knowing it is the same feature when zooming in, and also it enables fast clickability of features/POIs etc. (This is currently used eg. on the OsmAPP.org for both Maptiler planet tiles and OpenFreeMap tiles)

The schema for encoding (see here for more context):

  • in planetiler it is <osm_id>X, where X=1/2/3 for node/way/relation.
  • In openmaptiles stack the schema is X=0/1/4 instead.

//Thanks @SomeoneElseOSM for notifying me in https://community.openstreetmap.org/t/vector-tiles-on-osmf-hardware/121501/95.

zbycz avatar Jun 26 '25 10:06 zbycz

In real-world usage there is often a very good reason not to include OSM ids, which is that it stops features with otherwise identical tags being merged in the output tile. This is a very common tilesize optimisation.

But as you say, for interactive uses it can be worthwhile. So I could see this being an optional part of the Shortbread spec.

The Mapbox Vector Tile field for object ids is a 64-bit unsigned int (the spec doesn't say that, but the .proto does). To store an OSM ID, you obviously need not only the object ID but also the object type (node/way/relation). Encoding this in the topmost two bits of the uint seems the obvious solution to me, but yes, both planetiler and openmaptiles seem to do something else, so ¯\(ツ)

systemed avatar Jun 26 '25 10:06 systemed

The id data would help implement the "Clickable POIs on the frontpage" feature mentioned in the OSM Engineering Working Group’s Top Ten Tasks. https://wiki.openstreetmap.org/wiki/Top_Ten_Tasks#Clickable_POIs_on_the_frontpage

"The goal is to get clickable POIs on the osm.org frontpage. Additional UI features such as highlighting icons on hover could drastically improve the user experience. Note: these days, Vector Tiles would probably cover those requirements more or less out of the box."

ImreSamu avatar Jun 26 '25 10:06 ImreSamu

@systemed - for the reference @msbarry from planetiler already tested the scheme osm_id * 3 + {0 for node, 1 for way, 2 for relation} which is using even less bits than the two topmost. He reports the protobuf varint algorithm is little less effective than *10, but the Developer experience is maybe more important.

@ImreSamu - great to hear that!

zbycz avatar Jun 26 '25 12:06 zbycz

it stops features with otherwise identical tags being merged in the output tile. This is a very common tilesize optimisation.

Very interesting, I was totally unaware of this. Do you have quantitative data ? What is the tile size increase when you add OSM IDs?

etienneJr avatar Jun 26 '25 15:06 etienneJr

https://blog.cyclemap.link/2020-02-02-optimizing-vectortiles/ has some data. Merging features can bring tile size down by a factor of four. When you include removing id it's a factor of 5.

pnorman avatar Jun 26 '25 16:06 pnorman

I tested the impact of IDs on weight using tilemaker, an OSM extract covering the French Riviera, and config/process files from different sources, including shortbread-tilemaker. Adding IDs on all elements (with the "include_ids": true parameter in config.json) increased the size only by +14%. OK, it's significant, but not as massive as I'd thought from reading your previous messages. https://codeberg.org/cartes/web/issues/1001#issuecomment-5639639

etienneJr avatar Jun 28 '25 22:06 etienneJr

The impact isn't from the IDs themselves.

which is that it stops features with otherwise identical tags being merged in the output tile. This is a very common tilesize optimisation

Adding IDs on all elements (with the "include_ids": true parameter in config.json) increased the size only by +14%

Your numbers are in line with what I linked, which gave 20% for removing IDs alone.

pnorman avatar Jun 29 '25 22:06 pnorman

OK, thanks for confirming that. So, your position is that a 15% increase in the weight of OSMF vector tiles is too much for a minor gain in functionality (making everything clickable), compared to the main goal (minutely updated, all-purpose, lightweight vector tiles)?

etienneJr avatar Jun 30 '25 11:06 etienneJr

@etienneJr That's out of scope for this repo - this issue is for tracking what goes into the Shortbread spec, not any particular deployment on osm.org or any other site. Even if Shortbread doesn't choose to codify it in the spec, it would be possible for osm.org (or any other deployment) to extend their vector tiles to include ids should they so desire.

systemed avatar Jun 30 '25 11:06 systemed

So, your position is that a 15% increase

As repeated multiple times above, including IDs on every object prevents merging. This makes tiles 4x to 5x larger.

pnorman avatar Jun 30 '25 17:06 pnorman

Paul wrote

As repeated multiple times above, including IDs on every object prevents merging. This makes tiles 4x to 5x larger.

I'm sorry to admit that I can't follow your point (neither). And I'm also admittedly looking for Id's in objects for several reasons.

I read your explanation above:

In real-world usage there is often a very good reason not to include OSM ids, which is that it stops features with otherwise identical tags being merged in the output tile. This is a very common tilesize optimisation.

Can you give a hint or an example of such a merging?

Because given 5 separate nearby benches with otherwise identical tags: why should they be merged? Are you speaking about unimportant features, like nodes with no "main" tag?

And in these cases where merging makes sense (which I believe there are): why can't a tile generator still be extended to ignore id's for specificly configured features classes?

sfkeller avatar Jun 30 '25 18:06 sfkeller

Because given 5 separate nearby benches with otherwise identical tags: why should they be merged?

Because the MVT representation is more efficient.

Unmerged, it looks like this:

bench 1 tag 1 bench 1 tag 2 bench 1 tag 3 bench 1 type Point bench 1 MoveTo [new feature] bench 2 tag 1 bench 2 tag 2 bench 2 tag 3 bench 2 type Point bench 2 MoveTo [new feature] bench 3 tag 1 bench 3 tag 2 bench 3 tag 3 bench 3 type Point bench 3 MoveTo [new feature] bench 4 tag 1 bench 4 tag 2 bench 4 tag 3 bench 4 type Point bench 4 MoveTo [new feature] bench 5 tag 1 bench 5 tag 2 bench 5 tag 3 bench 5 type Point bench 5 MoveTo

Merged, it looks like this:

bench tag 1 bench tag 2 bench tag 3 bench type Point bench 1 MoveTo bench 2 MoveTo bench 3 MoveTo bench 4 MoveTo bench 5 MoveTo

which is a pretty chunky saving. Yet the rendering is exactly the same. See the MVT spec.

systemed avatar Jul 01 '25 10:07 systemed

You convinced me, @systemed. So what next?

But as you say, for interactive uses it can be worthwhile. So I could see this being an optional part of the Shortbread spec.

Let's make it optional?

For the use case of interactive uses I have a plan B: In the Swiss castle map we implemented aka of GetFeatureInfo (like in OGC WMS), which changes the cursor on client side to "Clickable", and only when clicked it does a server API call: see e.g. here.

sfkeller avatar Jul 01 '25 18:07 sfkeller

This makes tiles 4x to 5x larger

I continued my tests here with a complete study of the impact of tilemaker merging options outlined by @systemed (combine_xxx), with or without including ids, and I really can't reproduce that figure.

including IDs on every object prevents merging.

This is not what I observed. Even in a strategy with all merging options activated (on all layers and all levels), the increase was only +3% when adding ids. It seems to me that tilemaker is able to deal with ids even when merging is activated.

https://blog.cyclemap.link/2020-02-02-optimizing-vectortiles/ has some data. Merging features can bring tile size down by a factor of four.

I am wondering whether this factor is specific to the 2 tiles (and tile procedure) tested in this post ?

etienneJr avatar Jul 01 '25 20:07 etienneJr

I am wondering whether this factor is specific to the 2 tiles (and tile procedure) tested in this post ?

No. I've gotten similar numbers with multiple tilesets I've done, some OSM-based, some not. It's generally 3x-5x.

pnorman avatar Jul 02 '25 00:07 pnorman

No. I've gotten similar numbers with multiple tilesets I've done, some OSM-based, some not. It's generally 3x-5x.

Whatever the software and the schema ?

etienneJr avatar Jul 02 '25 04:07 etienneJr

Whatever the software and the schema ?

Yes, this is across multiple ways of generating tiles and multiple schemas.

pnorman avatar Jul 02 '25 05:07 pnorman

Yes, this is across multiple ways of generating tiles and multiple schemas.

OK, so do you have any hypotheses to explain what I observed in my test case? Or any ideas for further tests?

etienneJr avatar Jul 02 '25 07:07 etienneJr

I wondered whether my result was specific to the French Riviera, so I tested Colorado, and got the same result. I wondered if it was a positive effect of gzip compression, so I tested without compression and got the same result.

Please tell me what I did wrong and explain why I can't reproduce the including IDs prevents merging behaviour, which seems obvious to everyone?

etienneJr avatar Jul 02 '25 20:07 etienneJr

OK, I've finally understood that including IDs prevents merging doesn't express how the softwares work, but rather a general principle. With tilemaker, when you request both merging and IDs, merging takes priority, and IDs are not added (only 1 ID is kept for all the merged items), see this discussion. This explains why I could get light tiles: with all the merge options enabled and IDs requested, merging is done, and there are in fact very few IDs included. To be sure you get the correct ID for the elements you want to make clickable, you need to remove any merge options that may merge them with similar elements.

etienneJr avatar Jul 07 '25 10:07 etienneJr

Thanks @etienneJr for debugging it!

From my point of view, it would be great if shortbread could use exactly this strategy. It would thus make POIs with different names get always correct ID. If merging occurs anyway, it would be quite easy to query (e.g. overpass) for a nearby features with same tags.

One more brainstorming: Perhaps a merged feature could get different type id, to warn consumers of unsure results. eg X=5/6/7 for merged items?

zbycz avatar Jul 07 '25 11:07 zbycz

I've been thinking about how to make feature merging and IDs inclusion compatible. I've thought of a solution, but I imagine it would involve redefining MVT specification... ? The idea would be to group identical tags together but list the IDs separately, in this way:

bench tag 1 bench tag 2 bench tag 3 bench type Point bench 1 ID bench 1 MoveTo bench 2 ID bench 2 MoveTo bench 3 ID bench 3 MoveTo bench 4 ID bench 4 MoveTo bench 5 ID bench 5 MoveTo

What do you think?

etienneJr avatar Jul 08 '25 20:07 etienneJr

MVT changes are off-topic for here. If a new tile format comes out we can consider if it changes anything we are doing, but this is not the place to design a new format.

pnorman avatar Jul 08 '25 23:07 pnorman

I understand that several POIs with otherwise the same tags (e.g. benches) can be stored in MVT much more space-efficiently. Adding (OSM) ids would break that. For POIs that don't have the same tags because they already have different names (e.g. restaurants), there is no space-saving.

So, it makes sense to me to allow osm ids on kind of POIs which usually have a name , i.e. shops, offices, amenities like childcare etc. because this would make it much easier to make the map interactive (click on restaurant to retrieve details about the restaurant). Being able to click on benches, trash cans, street lighting, trees, bollards etc. I consider less useful for the typical use case of a general map. (Although, of course, not without use, just perhaps not for a general-purpose map and thus is not worth the extra space consumed by it.)

A list of kind of POIs that usually have a name (and can be entered) is maintained in StreetComplete here: https://github.com/streetcomplete/StreetComplete/blob/master/app/src/commonMain/kotlin/de/westnordost/streetcomplete/osm/Places.kt#L34

I.e. all shop, club, craft, healthcare, office and some amenity, emergency, leisure, military, tourism.

westnordost avatar Jul 23 '25 10:07 westnordost