osmdata
osmdata copied to clipboard
Speed up trim_osmdata
Because I finally had call to trim a huge data set (New York City within the official boundary polygon => around 700,000 vertices submitted to sp::point.in.polygon
or sf::st_within
). The latter especially does not scale well at all, and took something like half an hour. I should just bundle clipper
like I already have in moveability
and use that instead. That should make it entirely scalable.
I'm using osmium extract to operate directly on .osm files, based on a boundary stored in a geojson file, could it work for you ? The performance is quite good.
I'm using it through system calls to the Windows Subsystem for Linux, from R, so it might be tricky to integrate it with in a stand alone R package.
Yeah, for that kind of operation, osmium
is by far the best. On my TODO list is wrapping the src
code of that as an R package. I'll get to it one day ... until then, the command line suffices.
Another option is to do the trimming on the server side (also means less downloaded data).
Possibility 1.:
- Run Nominatim query and retrieve osm_id.
getbb("new York City", format_out = "data.frame")
returnsosm_id: 175905
. - Retrieve OP pre-calculated area. You need to adjust the id: for relations
id+3600000001
; for waysid+2400000001
. See: OP area filters In this case:area(id:3600175905)->.a;
- Filter results by area
node[natural=tree](area.a);
Full query:
[out:json][timeout:250];
area(id:3600175905)->.a;
node[natural=tree](area.a);
out body;
Possibility 2.:
- Run Nominatim as before
- Retrieve respective way or relation based on the
osm_id
In this caserel(id:175905);
- Convert to area
map_to_area->.a;
- Filter by area
node[natural=tree](area.a);
Full query:
rel(id:175905);
map_to_area->.a;
node[natural=tree](area.a);
out body;
Yes indeed that would be useful @Mashin6, and better in all ways. One way to achieve it might be to introduce yet another trim
function that gets piped before the main call, so we'd have a workflow like:
opq(...) |>
add_osm_feature(...) |>
overpass_trim(...) |>
osmdata_<whatever>()
There'd still be a use case for both forms, because area polygons don't always exist, and the current trim_osmdata()
function is intended (among other things) to enable data to be trimmed to entirely arbitrary polygons.
If you'd be interested in contributing more directly, please feel free to start a pull request to develop this further. Note also that #252 will require some kind of initial function to determine or validate an OSM area for a given nominatim query - just to check that the string corresponds to a single OSM relation ID. That would then also be used here.
I agree. Having an option to trim locally by a custom polygon is a useful feature. I will start a new issue for the server side trimming.