oshdb icon indicating copy to clipboard operation
oshdb copied to clipboard

improve performance for queries on sparse multipolygons

Open tyrasd opened this issue 5 years ago • 3 comments

Currently, when querying large, but very sparse multipolygons as areas-of-interest, it can happen that a lot of CPU time is spent inside the bbox <-> polygon testing, leading to poor performance:

image

In these situations, the current implementation of the fip package are not ideal (it assumes somewhat contiguous and not too unisotropic input polygons to work really well).

see also https://github.com/GIScience/oshdb/blob/0.5.9/oshdb-util/src/main/java/org/heigit/bigspatialdata/oshdb/util/geometry/fip/FastInPolygon.java#L72

//cc @SlowMo24

tyrasd avatar Oct 26 '20 13:10 tyrasd

I don't know if this helps but when I had to check which polygon a geometry lies in and the polygons had 'bad bboxes' (e.g. france with DOM/TOM) created an STRTree of polygons, not Mulipolygons:

stub:

Geometry geometry = wkbReader.read(geomBytes);
for (int i = 0; i < geometry.getNumGeometries(); i++) {
    Geometry geometryN = geometry.getGeometryN(i);
    index.insert(geometryN.getEnvelopeInternal(), new Object[]{theID, geometryN});
}

depending on the cost of area or complexity calculation something like https://postgis.net/docs/ST_Subdivide.html (this surely exists in jts or geotools) on very large/complex/overlapping polygons could be used to increase performance of STRTree even further. I then used that Index as follows within aggregateBy. The order of query might be difficult to determine:

  • search for bbox in polygon index or vice versa. There might also be a possibility to intersect two indices?

Attention, very ugly and possibly wrong code:

return index
    .query(centroid.getEnvelopeInternal())
    .parallelStream()
    .filter(entry -> ((Polygon) entry[1]).contains(geometry))
    .map(entry -> ((int) entry[0]))
    .collect(Collectors.toList());

//EDIT: correct code that was copy pasted out of context

SlowMo24 avatar Oct 28 '20 09:10 SlowMo24

yeah, for such geometries an implementation like what you propose would make sense. JTS even has a built-in utility/mechanism (called PreparedGeometry) which basically does more or less exactly this. But in "normal" situations the current fip package does a better job than JTS:

image

The crucial question would be to how to decide which implementation to use in which situation? And/or if it would be not even more performant to come up with a solution which can combine both approaches for sparse multipolygons: for the example of France -> split "distinct" regions into R-tree superstructure, and use the current fip routines to perform exact geometry tests. :thinking:

tyrasd avatar Oct 28 '20 17:10 tyrasd

great you did some benchmarking!!! Disclaimer: I am not sure I fully understand the problem. Maybe you could add some more information to the issue-text.

Anyway I will share my thought at the risk of them being off-topic: The problem we face is that sparse polygons in the current implementation force us to 'touch'/iterate a large number of GridOSHEntities because they lie within the bbox but not the Polygon. So even though the PreparedGeom is slower this would only be a one-time check and the actual computation would be faster as it would only 'touch' the necessary GridOSHEntities. To my understanding the same then applies to OSHEntities within the GridOSHEntity where we can first filter based on index and bbox and then run the actual computation on objects that additionally satisfy fip .

SlowMo24 avatar Oct 29 '20 11:10 SlowMo24