osm-planning icon indicating copy to clipboard operation
osm-planning copied to clipboard

PBF change for big data

Open jonathanl-telenav opened this issue 7 years ago • 6 comments

Currently, to determine the bounding rectangle of ways in a PBF that we're splitting up into pieces, we keep a map from node id to location in memory. Unfortunately, this takes up an increasingly large amount of memory as the OSM data set grows and grows (success brings certain problems!). A really useful change from my perspective would be to include the bounding rectangle of each way in the data so it's easy to determine where a way is without the giant node->location map. This could probably be done in an efficient (perhaps it is possible to store the bounding rectangle in two long values and also reduce the storage of node from two doubles to a single long, resulting in minimal file size growth) and backwards-compatible way (perhaps there's a way to detect version in the PBF header somehow) with some thought given to the matter by the technical leaders of the project. Thanks for listening! -- Best, Jon

jonathanl-telenav avatar Feb 28 '18 17:02 jonathanl-telenav

imho: similar idea with osmium support

  • https://blog.jochentopf.com/2016-04-20-node-locations-on-ways.html
    • http://docs.osmcode.org/osmium/latest/osmium-add-locations-to-ways.html

ImreSamu avatar Feb 28 '18 18:02 ImreSamu

Looping in @joto

mvexel avatar Feb 28 '18 19:02 mvexel

Including the bounding rectangle would help with only one specific use-case, namely creating some kind of geographical extract of the data. Adding the node locations to the ways as implemented by me and mentioned by @ImreSamu above allows a lot more uses. Yes, it has somewhat more overhead, but if you leave out all the nodes that don't have tags (osmium add-locations-to-ways can do this), the resulting file has a similar size to the original planet file. So I think this is the better solution.

@jonathanl-telenav: Have a look at this and tell us if this helps you at all.

joto avatar Mar 08 '18 13:03 joto

@joto: that approach doesn't generalize to full history files, does it?

tyrasd avatar Mar 08 '18 14:03 tyrasd

@tyrasd No, this can't work on full history files, because a node might change while the way using it didn't change. In effect we would have to make two way versions out of the one way version, one with the old node location, one with the new one. But this would give us two ways with the same version number.

In theory we could change the timestamp on the way and leave the version number. This way we could still tell the two way versions apart, but this would break if there are several changes in the same second. And it would break the assumption that (object-type, id, version) is unique in history files.

joto avatar Mar 08 '18 14:03 joto

I have modified Osmosis (posted a patch to the osmosis-dev list) to work with the add-locations-to-ways osmium output and it works really well. We're able to read planet files with much less memory use and in less time as well (due mainly to inefficiencies in managing giant maps).

We wouldn't really want PBFs with the nodes left out entirely because we sometimes (rarely, but often enough to care) need other metadata information on those nodes like the version number or modification time, but even with what looks like a fairly significant increase in size due to the added locations, the size increase would still be very much worth it. I could imagine OSM publishing a thin and a fat version of planet PBFs (with and without untagged nodes), but with networks speeds and disk costs what they are, I don't personally care for the thinner PBF with untagged nodes omitted.

Best,

Jon

From: Jochen Topf [email protected] Sent: Thursday, March 8, 2018 6:52:06 AM To: osmlab/osm-planning Cc: Locke, Jonathan; Mention Subject: Re: [osmlab/osm-planning] PBF change for big data (#4)

Including the bounding rectangle would help with only one specific use-case, namely creating some kind of geographical extract of the data. Adding the node locations to the ways as implemented by me and mentioned by @ImreSamuhttps://github.com/imresamu above allows a lot more uses. Yes, it has somewhat more overhead, but if you leave out all the nodes that don't have tags (osmium add-locations-to-ways can do this), the resulting file has a similar size to the original planet file. So I think this is the better solution.

@jonathanl-telenavhttps://github.com/jonathanl-telenav: Have a look at this and tell us if this helps you at all.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/osmlab/osm-planning/issues/4#issuecomment-371492178, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOU0oDHbIawpsQM4xKiBosklmX3Lfkm5ks5tcTeGgaJpZM4SXBgo.

jonathanl-telenav avatar Mar 08 '18 16:03 jonathanl-telenav