osmdata icon indicating copy to clipboard operation
osmdata copied to clipboard

Speed up trim_osmdata

Open mpadge opened this issue 4 years ago • 5 comments

Because I finally had call to trim a huge data set (New York City within the official boundary polygon => around 700,000 vertices submitted to sp::point.in.polygon or sf::st_within). The latter especially does not scale well at all, and took something like half an hour. I should just bundle clipper like I already have in moveability and use that instead. That should make it entirely scalable.

mpadge avatar Jul 12 '19 15:07 mpadge

I'm using osmium extract to operate directly on .osm files, based on a boundary stored in a geojson file, could it work for you ? The performance is quite good.

I'm using it through system calls to the Windows Subsystem for Linux, from R, so it might be tricky to integrate it with in a stand alone R package.

FlxPo avatar Nov 02 '21 14:11 FlxPo

Yeah, for that kind of operation, osmium is by far the best. On my TODO list is wrapping the src code of that as an R package. I'll get to it one day ... until then, the command line suffices.

mpadge avatar Nov 02 '21 16:11 mpadge

Another option is to do the trimming on the server side (also means less downloaded data).

Possibility 1.:

  • Run Nominatim query and retrieve osm_id. getbb("new York City", format_out = "data.frame") returns osm_id: 175905.
  • Retrieve OP pre-calculated area. You need to adjust the id: for relations id+3600000001; for ways id+2400000001. See: OP area filters In this case: area(id:3600175905)->.a;
  • Filter results by area node[natural=tree](area.a);

Full query:

[out:json][timeout:250];
area(id:3600175905)->.a;
node[natural=tree](area.a);
out body;



Possibility 2.:

  • Run Nominatim as before
  • Retrieve respective way or relation based on the osm_id In this case rel(id:175905);
  • Convert to area map_to_area->.a;
  • Filter by area node[natural=tree](area.a);

Full query:

rel(id:175905);
map_to_area->.a;
node[natural=tree](area.a);
out body;

Mashin6 avatar Nov 30 '21 01:11 Mashin6

Yes indeed that would be useful @Mashin6, and better in all ways. One way to achieve it might be to introduce yet another trim function that gets piped before the main call, so we'd have a workflow like:

opq(...) |>
    add_osm_feature(...) |>
    overpass_trim(...) |>
    osmdata_<whatever>()

There'd still be a use case for both forms, because area polygons don't always exist, and the current trim_osmdata() function is intended (among other things) to enable data to be trimmed to entirely arbitrary polygons.

If you'd be interested in contributing more directly, please feel free to start a pull request to develop this further. Note also that #252 will require some kind of initial function to determine or validate an OSM area for a given nominatim query - just to check that the string corresponds to a single OSM relation ID. That would then also be used here.

mpadge avatar Nov 30 '21 08:11 mpadge

I agree. Having an option to trim locally by a custom polygon is a useful feature. I will start a new issue for the server side trimming.

Mashin6 avatar Dec 04 '21 05:12 Mashin6