osmio icon indicating copy to clipboard operation
osmio copied to clipboard

built in strategies for getting Way/Relation locations

Open michaelkirk opened this issue 4 years ago • 3 comments

In the OSM data structures, only Nodes carry a location. Ways and Relations reference a set of nodes - so their location exists implicitly through the location of their nodes.

This is arguably a usefully compact and DRY usage of data structures, but when you're actually trying to do something useful with OSM data, you'll need to get the geometry for each Way / Relation.

Currently osmio mostly leaves this up to the client code to implement, but I think it'd make sense to provide osmio users with a blessed way to do this.

The algorithm tends to look something like this:

  1. First, store each Node's location, indexed by the Node's node_id
  2. Then, for each Way, take it's node_ids, and use them to look up the stored locations, copying them onto the Way.

This is conceptually simple, but becomes non-trivial when dealing with a range of file sizes. For very small files you probably want to just store everything in some kind of hashtable in RAM, but for very large files (e.g a continent.pbf or planet.pbf) this could require many GB. If you can afford it, you still might want it all in RAM, but for resource constrained environments, there should be a way to output the node locations to file. (nodestore seems to exist to this end).

I don't know exactly what the interface would look like, but I'd propose adding a couple of different strategies and documentation for hydrating Way and Relation locations into osmio.

michaelkirk avatar Jun 04 '21 16:06 michaelkirk

I agree. Someone asked me on the OSMUS Slack about this feature too. It's an important, and missing feature,

I have not yet done any real work on this.

One approch is to make a Rust Trait for that type of thing. You really only need a set and lookup functions. (I'm not sure on the names yet)

There are a few ways to actually implement this algorithm, with different trade offs, but I think if you genericise it to a trait, then osmio can just work with them, which provides flexibility to users of the library

amandasaurus avatar Jun 04 '21 19:06 amandasaurus

Once or twice when using this library, I didn't have the memory to store all nodes in the file, but I did have enough to store the nodes for the ways which I wanted to process. So a two pass approach is better. It would be good to be able to support this sort of trade off.

I think this "trait" approach could do it, I have a half formed idea in my head

On Fri, 04 Jun 2021 21:46 +02:00, Rory @.***> wrote:

I agree. Someone asked me on the OSMUS Slack about this feature too. It's an important, and missing feature,

I have not yet done any real work on this.

One approch is to make a Rust Trait for that type of thing. You really only need a set and lookup functions. (I'm not sure on the names yet)

There are a few ways to actually implement this algorithm, with different trade offs, but I think if you genericise it to a trait, then osmio can just work with them, which provides flexibility to users of the library

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rory/osmio/issues/6#issuecomment-854962222, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAMC33A4GPCPBT45KZF4DTREUS3ANCNFSM46DFER2Q.

amandasaurus avatar Jun 05 '21 06:06 amandasaurus

Once or twice when using this library, I didn't have the memory to store all nodes in the file, but I did have enough to store the nodes for the ways which I wanted to process. So a two pass approach is better. It would be good to be able to support this sort of trade off.

re: a two-pass approach: I think this can be a good strategy for some cases. If you're converting the entire file, you can save something like 1/2 the memory (of course it depends on your particular data...). But it's especially useful if you want to one day also support a clipping rect/poly - then your memory usage is pretty much proportional to the size of your clip.

I agree that a trait based approach is probably a good foundation, so that different use cases can be supported.

Is this something you have time/interest to work on?

michaelkirk avatar Jun 06 '21 00:06 michaelkirk