tilemaker
tilemaker copied to clipboard
Massive performance issue with coastline shapefiles
Processing a Devon extract with coastline shapefiles (OMT schema) now takes:
real 1m10.838s [down from 2m2.193s]
user 3m30.221s
sys 0m4.077s
Without coastline:
real 0m25.509s
user 0m45.782s
sys 0m1.050s
The main culprit appears to be boost::geometry's intersection (clipping) operation. We're spending 20%+ of total tile output time in there. https://github.com/systemed/tilemaker/commit/e6962f9ddef79793334a012e5b5675efe177c656 has cut it down from the previous 30% but there should still be room for improvement.
Already tried:
Adding an intersection check in ShpMemTiles::addToTileIndexByBbox
, rather than just using all tiles between min and max lon: this may give a minor improvement on output times, but greatly increases .shp read time. Requires a change to the method signature (to pass in Geometry &geom
) plus
TileBbox bbox(index, baseZoom);
if (geom::intersects(bbox.clippingBox,geom)) { tileIndex[index].push_back(oo); }
Is this also why a country like norway takes so much time to generate ?
I suspect it's partly that, yes. Probably also that Norway will have lots of complex multipolygons for lakes (cf https://cycle.travel/map?lat=62.0958&lon=7.5643&zoom=13), which will be slower to assemble and for Boost.geometry to process.
But also norway has a huge amount of tiles to generate. The bounding box on geofabrik is huge. 9 million tiles to generate. But most is just ocean i guess. They should be fast to generate but appear not to be.
Or maybe this ocean is a very large polygon with many nodes along the coastline. Which needs to be clipped to many many tiles inside this polygon.
I also notice a significant performance hit when coast files are used. Also when the .osm.pbf bounding box is just a small area relative to the supplied bounding_box
parameter in the JSON configuration. Is there any documentation on the code that is executed to incorporate shapefiles into the result? I am happy to help think of a possible performance gain in this part.
There is no documentation on the design of tilemaker, but in essence, it is not that difficult. The shape are read from: https://github.com/systemed/tilemaker/blob/master/src/read_shp.cpp
Based on the geometry, certain types of OutputObjects are generated: https://github.com/systemed/tilemaker/blob/master/src/output_object.cpp
The generated OutputObjects are attached to the tiles at the basezoom level (14), based on which tiles should contain parts of these geometries: https://github.com/systemed/tilemaker/blob/master/src/tile_data.cpp https://github.com/systemed/tilemaker/blob/master/src/shp_mem_tiles.cpp
Finally the tile data is generated from: https://github.com/systemed/tilemaker/blob/master/src/tile_worker.cpp https://github.com/systemed/tilemaker/blob/master/src/write_geometry.cpp
This is done by clipping, simplifying and writing the relevant OutputObjects for each tile in the tile pbf format. Finally storing on disk or mbtiles files.
What could be helpful is profiling what happens when you use shapesfiles. Compile tilemaker with profiling (-pg) and get a profile with and without loading the shapefile. If you want to improve the performance, you need to understand where the most time is spent.
if @systemd is correct and the main culprit is the clip:
The main culprit appears to be boost::geometry's intersection (clipping) operation. We're spending 20%+ of total tile output time in there. e6962f9 has cut it down from the previous 30% but there should still be room for improvement.
Then possibly it would be useful to implement a box clipping. Boost geometry at the moment only supports clipping multipolygon to a polygon. Possibly a single polygon can be clipped to a box ?
Tippecanoe implements a fast box clipper: https://github.com/mapbox/tippecanoe/blob/18e53cd7fb9ae6be8b89d817d69d3ce06f30eb9d/mapbox/geometry/wagyu/quick_clip.hpp