Shapefile.jl icon indicating copy to clipboard operation
Shapefile.jl copied to clipboard

move to GeometryBasics

Open SimonDanisch opened this issue 5 years ago • 17 comments

This PR is aimed at providing GeometryBasics types for Shapefile geometries. One of the major advantages being it's Metadata capability and the ability to plot Shapefiles with Makie/GeoMakie where the former already supports GeometryBasic types.
It's an ecosystem wide movement to move more to GeometryBasics. It is planned to be merged along with GeoInterfaceRFC once both are ready.

SimonDanisch avatar May 17 '20 12:05 SimonDanisch

I've just tried out this branch. It doesn't capture the case where a polygon has multiple exterior rings.

According to the shapefile documentation interior rings can be distinguished from exterior rings because exterior rings run clockwise, while interior rings run counter-clockwise. Here is an algorithm that checks if a non-convex ring runs clockwise: https://stackoverflow.com/a/1165943/4984167 (I've just tried it and it seemed that the sign is wrong)

function f(edge)
    x1, y1 = edge[1]
    x2, y2 = edge[2]
    (x2 - x1) * (y2 + y1)
end

clockwise(edges) = sum(f, edges) > 0
hole(edges) = !clockwise(edges)

So if there is more than one exterior ring, a MultiPolynomial must be used.

Then it could be possible that exterior rings are nested (https://en.wikipedia.org/wiki/Baarle). So either

  • create a partial order on exterior rings and check from the smallest ring, or
  • check if containment in all exterior rings and check if there are multiple containing rings

(EDIT: I believe that this is solved here https://gist.github.com/jkrumbiegel/b82def0a3fb0a822963ec7f97278190c)

For the containment check there is PolygonOps.inpolygon

using GeometryBasics
using PolygonOps
using Underscores

linestring2point(ring) = @_ ring .|> _[1] |> __[[1:end;1]]

@assert rings_lstr isa Vector{<: LineString}

rings_pts = linestring2point.(rings_lstr)
@assert rings_pts isa Vector{<: Vector{<: Point}}

holes = hole.(rings_lstr)
exterior_inds = findall(.! holes)
interior_inds = findall(holes)


if length(exterior_inds) == 1 # construct Polygon
    ext = ring_pts[only(exterior_inds)]
    ints = ring_pts[interior_inds]
    Polygon(ext, ints)
else # construct MultiPolygon
    ext_matched = map(interior_inds) do i_int
        int = rings_pts[i_int]
        pt_int = int[1]
        
        i_ext_matched = findfirst(exterior_inds) do i_ext # doesn't handle the cases when there are nested exterior rings (polygon in a hole)
            poly = rings_pts[i_ext]
            inpolygon(pt_int, poly) == 1
        end
        
        i_ext_matched
    end

    map(enumerate(exterior_inds)) do (i,i_ext)
        ext = rings_pts[exterior_inds][i_ext]
        ints = rings_pts[interior_inds][ext_matched .== i]
        @assert ext isa Vector{<: AbstractPoint}
        @assert ints isa Vector{<: AbstractVector{<: AbstractPoint}}
        Polygon(ext, ints) #fails here
    end |> MultiPolygon
end

greimel avatar Aug 10 '20 21:08 greimel

I see, you're right, interesting. One thing I don't understand though, is why would we need to special case nested exterior rings? The spec doesn't talk about this.

I agree that multiple exterior rings would mean a MultiPolygon. One thing that is not clear to me, is how to assign interior rings to exterior ones, like they need to be in MultiPolygon. Is the sequence always like this, where E = Exterior and I = Interior:

E I I E I
----- ___
pol1 pol2

Such that the first ring must always be exterior, and that interior rings belong to the last exterior ring that was before?

visr avatar Aug 10 '20 21:08 visr

Is the sequence always like this

The second last paragraph here https://docs.oracle.com/database/121/SPATL/polygon-hole.htm might point to what you're saying?

Sov-trotter avatar Aug 10 '20 22:08 Sov-trotter

As far as I understand the shapefile documentation file, the order is arbitrary. So one needs to check which interior polynomial belongs to which exterior polynomial. (I really hope that I am wrong here, this would simplify things a lot)

this becomes more difficult if exterior polygons are nested, because then the interior polygon belongs to the smallest exterior polygon that it contains. Imagine 2N concentric circles with different radii r_1 > r_2 > ... > r_2N. Then these have to be represented as a Multipolygons with N Polygons (ext=r_1, int=r_2), (ext=r_3, int=r_4), ... (ext=r_2N-1, int=2N)

One would have to partially order the exterior polygons. if the interior poly is contained in one of the maximal ext polies, then one has to look at biggest ext polys that are contained in this poly and so on and so forth

I linked to a gist by @jkrumbiegel above, I think he solves the same problem a little different. I didn't have time yet to digest his code though.

greimel avatar Aug 10 '20 22:08 greimel

Is the sequence always like this

The second last paragraph here https://docs.oracle.com/database/121/SPATL/polygon-hole.htm might point to what you're saying?

The ESRI Shapefile Technical Description, page 9, second bullet point, it says

The order of rings in the points array is not significant.

greimel avatar Aug 10 '20 22:08 greimel

I see. This may be an implementation we can borrow from as well: https://github.com/mbostock/shapefile/blob/master/shp/polygon.js

I think if possible I'd like to avoid the additional PolygonOps dependency, and just add the few functions we need here to the code, for internal use.

visr avatar Aug 10 '20 22:08 visr

why don't you want to have it as a dep? It has no dependencies itself, and it really defines only two functions.

If we copy the inpolygon function then we would copy-paste most of the package.

greimel avatar Aug 10 '20 22:08 greimel

I will extend my comment above into a PR against this PR tomorrow

greimel avatar Aug 10 '20 22:08 greimel

Hmm yeah good point. I see now PolygonOps is pretty minimal, so I guess it's fine. Still have to get to terms with the fact that we need to do polygon ops to parse polygons...

Not sure how the PolygonOps and shapefile.js algorithms compare though. The JavaScript looks even more compact.

visr avatar Aug 10 '20 22:08 visr

Still have to get to terms with the fact that we need to do polygon ops to parse polygons...

This goes back to the confusion about the definition of polynomials on Slack. PolygonOps is not about GeometryBasics.Polygons, it's about polygons without holes!

So probably it would be nice to have a GeometryBasics.SimplePolygon type (without holes) and then a Polygon consists of a exterior SimplePolygon and interior LineStrings.

greimel avatar Aug 10 '20 22:08 greimel

Yeah, right now GeometryBasics.Polygon consist of LineString, 1 for exterior, n for interiors.

I'd keep things somewhat close to the Simple Features standard, and would rather use LinearRing than SimplePolygon, since simple in SF is defined like this:

A Geometry is simple if and only if the only self-intersections are at boundary points.

The LinearRing comes back much more, see for instance: https://tools.ietf.org/html/rfc7946#section-3.1.6 https://github.com/JuliaGeo/GeoInterfaceRFC.jl/blob/df62a18d6960007c5c006df875b1654c438d143f/src/types.jl#L19-L20

In terms of representation a LineString and LinearRing would be much the same. We probably need to decide if we also want to have the same point as first and last point just like in the specs, see for instance https://github.com/JuliaGeometry/GeometryBasics.jl/issues/9

visr avatar Aug 11 '20 08:08 visr

Just so that it doesn't get lost. What is to be done here (because it was missing from my PR #40) is

  • [ ] port the MultiPolygon tests from https://github.com/GeospatialPython/pyshp/blob/master/test_shapefile.py
  • [ ] this will require adapting the MultiPolygon code a bit, because pyshp allows deviations from the shape file standards
  • [ ] figure out the correct eltype for MultiPolygons (required for tests to pass)

this is probably material for a separate PR against this branch, but I don't have time for this in the next months.

greimel avatar Sep 29 '20 10:09 greimel

Is there any chance we can get this into a mergeable state?

SimonDanisch avatar Mar 24 '21 10:03 SimonDanisch

@greimel is the current state of Polygon/MultiPolygon according to what you had talked about? If not, I could help to get this up and running?

Sov-trotter avatar Mar 24 '21 10:03 Sov-trotter

I've been using this branch for reading many shapefiles for many countries without problems.

So I think it's mergeable.

  • port the MultiPolygon tests from https://github.com/GeospatialPython/pyshp/blob/master/test_shapefile.py
  • this will require adapting the MultiPolygon code a bit, because pyshp allows deviations from the shape file standards

I've not yet encountered incorrectly specified shapefiles, so I couldn't justify working on that.

figure out the correct eltype for MultiPolygons (required for tests to pass)

I don't remember why this is going wrong. But I think the problem is that the different MultiPolynomials had different types if they have holes or not or if they consist of multiple segments, so we cannot determine the type in advance.

I wondered if using https://juliafolds.github.io/BangBang.jl/v0.1/#BangBang.push!!-Tuple{Any,Any,Any,Vararg{Any,N}%20where%20N} would be acceptable here?

greimel avatar Mar 24 '21 10:03 greimel

As an aside, we should add a map of https://en.wikipedia.org/wiki/Baarle-Hertog to the tests. If we can handle this, what else could go wrong.

greimel avatar Mar 24 '21 10:03 greimel

@greimel can you share that shapefile you're trying to read which gives an error due to shapes = Vector{Union{jltype,Missing}}(undef, 0) ?

Sov-trotter avatar Mar 26 '21 14:03 Sov-trotter