Shapefile.jl
Shapefile.jl copied to clipboard
move to GeometryBasics
This PR is aimed at providing GeometryBasics types for Shapefile geometries. One of the major advantages being it's Metadata capability and the ability to plot Shapefiles with Makie/GeoMakie where the former already supports GeometryBasic types.
It's an ecosystem wide movement to move more to GeometryBasics.
It is planned to be merged along with GeoInterfaceRFC once both are ready.
I've just tried out this branch. It doesn't capture the case where a polygon has multiple exterior rings.
According to the shapefile documentation interior rings can be distinguished from exterior rings because exterior rings run clockwise, while interior rings run counter-clockwise. Here is an algorithm that checks if a non-convex ring runs clockwise: https://stackoverflow.com/a/1165943/4984167 (I've just tried it and it seemed that the sign is wrong)
function f(edge)
x1, y1 = edge[1]
x2, y2 = edge[2]
(x2 - x1) * (y2 + y1)
end
clockwise(edges) = sum(f, edges) > 0
hole(edges) = !clockwise(edges)
So if there is more than one exterior ring, a MultiPolynomial must be used.
Then it could be possible that exterior rings are nested (https://en.wikipedia.org/wiki/Baarle). So either
- create a partial order on exterior rings and check from the smallest ring, or
- check if containment in all exterior rings and check if there are multiple containing rings
(EDIT: I believe that this is solved here https://gist.github.com/jkrumbiegel/b82def0a3fb0a822963ec7f97278190c)
For the containment check there is PolygonOps.inpolygon
using GeometryBasics
using PolygonOps
using Underscores
linestring2point(ring) = @_ ring .|> _[1] |> __[[1:end;1]]
@assert rings_lstr isa Vector{<: LineString}
rings_pts = linestring2point.(rings_lstr)
@assert rings_pts isa Vector{<: Vector{<: Point}}
holes = hole.(rings_lstr)
exterior_inds = findall(.! holes)
interior_inds = findall(holes)
if length(exterior_inds) == 1 # construct Polygon
ext = ring_pts[only(exterior_inds)]
ints = ring_pts[interior_inds]
Polygon(ext, ints)
else # construct MultiPolygon
ext_matched = map(interior_inds) do i_int
int = rings_pts[i_int]
pt_int = int[1]
i_ext_matched = findfirst(exterior_inds) do i_ext # doesn't handle the cases when there are nested exterior rings (polygon in a hole)
poly = rings_pts[i_ext]
inpolygon(pt_int, poly) == 1
end
i_ext_matched
end
map(enumerate(exterior_inds)) do (i,i_ext)
ext = rings_pts[exterior_inds][i_ext]
ints = rings_pts[interior_inds][ext_matched .== i]
@assert ext isa Vector{<: AbstractPoint}
@assert ints isa Vector{<: AbstractVector{<: AbstractPoint}}
Polygon(ext, ints) #fails here
end |> MultiPolygon
end
I see, you're right, interesting. One thing I don't understand though, is why would we need to special case nested exterior rings? The spec doesn't talk about this.
I agree that multiple exterior rings would mean a MultiPolygon. One thing that is not clear to me, is how to assign interior rings to exterior ones, like they need to be in MultiPolygon. Is the sequence always like this, where E = Exterior and I = Interior:
E I I E I
----- ___
pol1 pol2
Such that the first ring must always be exterior, and that interior rings belong to the last exterior ring that was before?
Is the sequence always like this
The second last paragraph here https://docs.oracle.com/database/121/SPATL/polygon-hole.htm might point to what you're saying?
As far as I understand the shapefile documentation file, the order is arbitrary. So one needs to check which interior polynomial belongs to which exterior polynomial. (I really hope that I am wrong here, this would simplify things a lot)
this becomes more difficult if exterior polygons are nested, because then the interior polygon belongs to the smallest exterior polygon that it contains. Imagine 2N concentric circles with different radii r_1 > r_2 > ... > r_2N. Then these have to be represented as a Multipolygons with N Polygons (ext=r_1, int=r_2), (ext=r_3, int=r_4), ... (ext=r_2N-1, int=2N)
One would have to partially order the exterior polygons. if the interior poly is contained in one of the maximal ext polies, then one has to look at biggest ext polys that are contained in this poly and so on and so forth
I linked to a gist by @jkrumbiegel above, I think he solves the same problem a little different. I didn't have time yet to digest his code though.
Is the sequence always like this
The second last paragraph here https://docs.oracle.com/database/121/SPATL/polygon-hole.htm might point to what you're saying?
The ESRI Shapefile Technical Description, page 9, second bullet point, it says
The order of rings in the points array is not significant.
I see. This may be an implementation we can borrow from as well: https://github.com/mbostock/shapefile/blob/master/shp/polygon.js
I think if possible I'd like to avoid the additional PolygonOps dependency, and just add the few functions we need here to the code, for internal use.
why don't you want to have it as a dep? It has no dependencies itself, and it really defines only two functions.
If we copy the inpolygon function then we would copy-paste most of the package.
I will extend my comment above into a PR against this PR tomorrow
Hmm yeah good point. I see now PolygonOps is pretty minimal, so I guess it's fine. Still have to get to terms with the fact that we need to do polygon ops to parse polygons...
Not sure how the PolygonOps and shapefile.js algorithms compare though. The JavaScript looks even more compact.
Still have to get to terms with the fact that we need to do polygon ops to parse polygons...
This goes back to the confusion about the definition of polynomials on Slack. PolygonOps is not about GeometryBasics.Polygons, it's about polygons without holes!
So probably it would be nice to have a GeometryBasics.SimplePolygon type (without holes) and then a Polygon consists of a exterior SimplePolygon and interior LineStrings.
Yeah, right now GeometryBasics.Polygon consist of LineString, 1 for exterior, n for interiors.
I'd keep things somewhat close to the Simple Features standard, and would rather use LinearRing than SimplePolygon, since simple in SF is defined like this:
A Geometry is simple if and only if the only self-intersections are at boundary points.
The LinearRing comes back much more, see for instance: https://tools.ietf.org/html/rfc7946#section-3.1.6 https://github.com/JuliaGeo/GeoInterfaceRFC.jl/blob/df62a18d6960007c5c006df875b1654c438d143f/src/types.jl#L19-L20
In terms of representation a LineString and LinearRing would be much the same. We probably need to decide if we also want to have the same point as first and last point just like in the specs, see for instance https://github.com/JuliaGeometry/GeometryBasics.jl/issues/9
Just so that it doesn't get lost. What is to be done here (because it was missing from my PR #40) is
- [ ] port the MultiPolygon tests from https://github.com/GeospatialPython/pyshp/blob/master/test_shapefile.py
- [ ] this will require adapting the MultiPolygon code a bit, because pyshp allows deviations from the shape file standards
- [ ] figure out the correct eltype for MultiPolygons (required for tests to pass)
this is probably material for a separate PR against this branch, but I don't have time for this in the next months.
Is there any chance we can get this into a mergeable state?
@greimel is the current state of Polygon/MultiPolygon according to what you had talked about?
If not, I could help to get this up and running?
I've been using this branch for reading many shapefiles for many countries without problems.
So I think it's mergeable.
- port the MultiPolygon tests from https://github.com/GeospatialPython/pyshp/blob/master/test_shapefile.py
- this will require adapting the MultiPolygon code a bit, because pyshp allows deviations from the shape file standards
I've not yet encountered incorrectly specified shapefiles, so I couldn't justify working on that.
figure out the correct eltype for MultiPolygons (required for tests to pass)
I don't remember why this is going wrong. But I think the problem is that the different MultiPolynomials had different types if they have holes or not or if they consist of multiple segments, so we cannot determine the type in advance.
I wondered if using https://juliafolds.github.io/BangBang.jl/v0.1/#BangBang.push!!-Tuple{Any,Any,Any,Vararg{Any,N}%20where%20N} would be acceptable here?
As an aside, we should add a map of https://en.wikipedia.org/wiki/Baarle-Hertog to the tests. If we can handle this, what else could go wrong.
@greimel can you share that shapefile you're trying to read which gives an error due to
shapes = Vector{Union{jltype,Missing}}(undef, 0) ?