geos icon indicating copy to clipboard operation
geos copied to clipboard

Improve consistency of `GEOSVoronoiDiagram`

Open BuonOmo opened this issue 3 years ago • 1 comments

change

the getDiagramEdges now only returns a MultiLineString, whether there is one or many lines. Which reduces to two the number of possible kind of geometries given by GEOSVoronoiDiagram

reason

I've been working on adding voronoi_diagram to RGeo. And looking at the Shapely PR for voronoi diagram and mine in RGeo, it seems like the GEOSVoronoiDiagram CAPI method causes confusion by returning three possible kind of geometries:

  • GeometryCollection (when onlyEdges is false)
  • MultiLineString (onlyEdges is true and there are at least two lines)
  • LineString (onlyEdges is true and there is only one line)

The parameter onlyEdges changes the outcome, which is ok to reason about. However, the fact that we could have either a LineString or a MultiLineString is way harder to work around in my opinion.

Note for reviewer

It's been year since I haven't made a single line of C++, feel free to fix the implementation! To be honest, I've copied the code from other parts of the library, until one was working 😅

Alternative

We could also wrap the result in a GeometryCollection, to comply to the original description and be even more consistent. But I guess this would be more work here, and less efficient as well

BuonOmo avatar Oct 14 '22 15:10 BuonOmo

Actually, after running a benchmark, the alternative is not really slower than the first proposition. Hence I'll change for the alternative. I'll do that in a separate commit, which will be easier to reverse.

Benchmark
///usr/bin/env g++ ${0} -std=c++11 -isystem ../google/benchmark/include -I/usr/local/include -L../google/benchmark/build/src -L/usr/local/lib -lbenchmark -lgeos -o ${0%%.cc} && ${0%%.cc} ; exit $?
#include <benchmark/benchmark.h>

#include <geos/triangulate/VoronoiDiagramBuilder.h>

#include <geos/io/WKTReader.h>
#include <geos/io/WKBReader.h>

using namespace geos::triangulate;
using namespace geos::geom;
using namespace geos::io;

#define bench(name, wkt) static void multiline_##name(benchmark::State& state) { \
  geos::triangulate::VoronoiDiagramBuilder builder; \
  std::unique_ptr<Geometry> geom(readTextOrHex(wkt)); \
  const GeometryFactory& geomFact(*GeometryFactory::getDefaultInstance()); \
  builder.setSites(*geom); \
  for (auto _ : state) { \
    builder.getDiagramEdges(geomFact); \
  } \
} \
BENCHMARK(multiline_##name); \
static void collection_##name(benchmark::State& state) { \
  geos::triangulate::VoronoiDiagramBuilder builder; \
  std::unique_ptr<Geometry> geom(readTextOrHex(wkt)); \
  const GeometryFactory& geomFact(*GeometryFactory::getDefaultInstance()); \
  builder.setSites(*geom); \
  for (auto _ : state) { \
    builder.getDiagramEdgesCollection(geomFact); \
  } \
} \
BENCHMARK(collection_##name);

static std::unique_ptr<Geometry>
readTextOrHex(const char* wkt)
{
    WKTReader wktreader;
    WKBReader wkbreader;
    std::string wktstr(wkt);
    if ("01" == wktstr.substr(0, 2) || "00" == wktstr.substr(0, 2))
    {
        std::stringstream is(wkt);
        return wkbreader.readHEX(is);
    }
    else
        return wktreader.read(wktstr);
}

bench(short, "LINESTRING (10 10, 20 20, 30 30)")
bench(long, "0104000080170000000101000080EC51B81E11A20741EC51B81E85A51C415C8FC2F528DC354001010000801F85EB5114A207415C8FC2F585A51C417B14AE47E1BA3540010100008085EB51B818A20741A8C64B3786A51C413E0AD7A3709D35400101000080000000001BA20741FED478E984A51C413E0AD7A3709D3540010100008085EB51B818A20741FED478E984A51C413E0AD7A3709D354001010000800AD7A37016A20741FED478E984A51C413E0AD7A3709D35400101000080000000001BA2074154E3A59B83A51C413E0AD7A3709D3540010100008085EB51B818A2074154E3A59B83A51C413E0AD7A3709D354001010000800AD7A37016A2074154E3A59B83A51C413E0AD7A3709D35400101000080000000001BA20741AAF1D24D82A51C413E0AD7A3709D3540010100008085EB51B818A20741AAF1D24D82A51C413E0AD7A3709D35400101000080F6285C8F12A20741EC51B81E88A51C414160E5D022DB354001010000802222222210A2074152B81EC586A51C414160E5D022DB354001010000804F1BE8B40DA2074152B81EC586A51C414160E5D022DB354001010000807B14AE470BA2074152B81EC586A51C414160E5D022DB354001010000802222222210A20741B81E856B85A51C414160E5D022DB354001010000804F1BE8B40DA20741B81E856B85A51C414160E5D022DB354001010000807B14AE470BA20741B81E856B85A51C414160E5D022DB35400101000080A70D74DA08A20741B81E856B85A51C414160E5D022DB35400101000080D4063A6D06A20741B81E856B85A51C414160E5D022DB354001010000807B14AE470BA207411F85EB1184A51C414160E5D022DB35400101000080A70D74DA08A207411F85EB1184A51C414160E5D022DB35400101000080D4063A6D06A207411F85EB1184A51C414160E5D022DB3540")

BENCHMARK_MAIN();
Run on (10 X 24.1209 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x10)
Load Average: 3.19, 2.77, 2.67
-----------------------------------------------------------
Benchmark                 Time             CPU   Iterations
-----------------------------------------------------------
multiline_short      313247 ns       313187 ns         2235
collection_short     318665 ns       318316 ns         2197
multiline_long      2349749 ns      2349218 ns          294
collection_long     2407711 ns      2406856 ns          291

Personnaly I prefer this option as it is even simpler in the CAPI GEOSVoronoiDiagram usage.

BuonOmo avatar Oct 17 '22 12:10 BuonOmo

I'm a little concerned about the CI failure, is this rebased to main? I don't really see how your API change could cause the failure, but neither do I want to merge it if it's somehow coming from there....

pramsey avatar Nov 09 '22 18:11 pramsey

I think we're ok, the failure was fixed in main at 433b91aa0bba05ce8833f5831a4bd82e8450a9e8

dbaston avatar Nov 09 '22 21:11 dbaston

Does this change the return value to always be an actual GeometryCollection, which contains LineStrings if onlyEdges is true?

If so, I am not in favour of this change. One reason is that MultiLineString is more widely supported than a GeometryCollection in the GEOS API (i.e. by spatial predicates and overlay ops). Also, since it is a subclass of GeometryCollection there should be no difference in downstream processing.

Was the stated intention was to eliminate the possibility of returning a single LineString? This is more reasonable., especially since I'm not sure a Voronoi diagram can ever contain a single edge. Although there are other places in the API which can return either a single LineString or a MultiLineString, so does this really need to a special case?

Note that the getNumGeometries and getGeometryN methods are on the Geometry class. This is a deliberate difference to the SFS standard API, and was done to allow more uniform handling of collections and singleton geometries.

dr-jts avatar Nov 09 '22 23:11 dr-jts

@pramsey the failure was in main at the time indeed, so I'll rebase

@dr-jts, both options were considered (and commits for both options are available to be discussed in the branch). I'd be happy to rollback to the previous one.

For the reasoning behind it, I'm not directly using GEOS C++ API, but GEOS C API to maintain its Ruby bindings (RGeo)

As I have seen in the Shapely implementation of voronoi diagram, and as I see now for RGeo, the fact that this method can return this wide range of types is confusing. And the single LineString is even more, as in more dynamic languages where you can directly iterate over the result, it would be counter intuitive to offer iteration over a single geometry.

I'll rollback to the initial proposition of making sure we at least have a collection. Tell me if that works for you or not.

EDIT: You'll find two commits with the two proposal, and the commit messages indicating the reasoning behind. Hopefully the latter is ok with you

BuonOmo avatar Nov 10 '22 10:11 BuonOmo

The MultiLineString option looks good to me.

The C API doc needs some work to describe the two potential results. Feel free to take a crack at this, or it can be fixed up later.

dr-jts avatar Nov 10 '22 18:11 dr-jts

To me it feels like maybe there should be a separate C API function to return Voronoi edges (which is how this is exposed in PostGIS). Maybe that horse has left the barn, though.

dr-jts avatar Nov 10 '22 19:11 dr-jts

To me it feels like maybe there should be a separate C API function to return Voronoi edges (which is how this is exposed in PostGIS). Maybe that horse has left the barn, though.

@dr-jts yeah I felt exactly the same way when implementing. Since lots of libraries depend on GEOS CAPI, I did not dare.

I'll try and and fix the documentation, but I'm not sure I'd be the best at this. I'll try tomorrow, if it is not enough, I'll revert so you can merge this before it takes too long. What do you think?

BuonOmo avatar Nov 10 '22 22:11 BuonOmo

I'll try and and fix the documentation, but I'm not sure I'd be the best at this. I'll try tomorrow, if it is not enough, I'll revert so you can merge this before it takes too long. What do you think?

Sounds good!

dr-jts avatar Nov 10 '22 22:11 dr-jts

Another reason for sticking with MultiLineString as the return type for edges is that it's non-breaking change. Switching to a GeometryCollection might break API.

dr-jts avatar Nov 10 '22 22:11 dr-jts

Another reason for sticking with MultiLineString as the return type for edges is that it's non-breaking change. Switching to a GeometryCollection might break API.

It is still breaking since it doesn't return a LineString anymore. Although this could arguably be considered a bug fix.

And per documentation, returning a GeometryCollection would actually be already, since that is what is written.

Anyway, I'm fooling around. The MultiLineString choices seems practical, and will be easy enough to explain in RGeo

BuonOmo avatar Nov 10 '22 22:11 BuonOmo

@pramsey @dr-jts is there any blocker left?

BuonOmo avatar Nov 20 '22 15:11 BuonOmo

@dr-jts @dbaston @pramsey Is this okay to commit or something else needs to be done?

robe2 avatar Nov 22 '22 01:11 robe2