topojson icon indicating copy to clipboard operation
topojson copied to clipboard

Conversion to Typology object causes overlaps

Open Dave-Evans opened this issue 2 years ago • 3 comments

Hello and thank you for your work on this project.

I was noticing an issue with a multipolygon layer I have which results from a conversion from a raster layer. There are five different strata in this example and there are no overlaps with the initial data. I am interested in creating a topojson and then simplifying the boundaries.

After conversion to a Topology object, there are a number of overlapping areas which result. Unless I misunderstand what this library should do, the resulting layer should not have any overlaps.

This is illustrated below when plotting the strata with a bit of transparency. In the most obvious case, in central area you can see a brown where stratum 2 overlaps with strata 3; this area should be entirely strata 3. There are other areas of overlap as well.

import topojson as tp
import geopandas as gpd
import matplotlib.pyplot as plt
fl_strata = "strata.geojson"

strata_data = gpd.read_file(fl_strata)

topo = tp.Topology(strata_data, prequantize=False)

topo_gdf = topo.to_gdf()
topo_gdf.plot(categorical=True, cmap='tab10', legend=True, column='StrataID', alpha=0.6)
plt.show()

Example data is attached. example.zip

Thanks again for all your help, Dave

Dave-Evans avatar Mar 10 '22 03:03 Dave-Evans

You are right. I can reproduce this with solely StrataID is 2 as well:

strata_data = strata_data[strata_data.StrataID.isin(["2"])]
strata_data.plot(categorical=True, cmap='tab10', legend=True, column='StrataID', alpha=0.6)
image

and

(
    tp.Topology(strata_data, prequantize=False)
    .to_gdf()
    .plot(categorical=True, cmap='tab10', legend=True, column='StrataID', alpha=0.6)
)
image

The good thing though, is that it serialise OK into altair:

tp.Topology(strata_data, prequantize=False).to_alt(color='properties.StrataID:N')
image Meaning that the creation of topojson is going good, but the serialization into a GeoJSON not, relevant code here [https://github.com/mattijn/topojson/blob/master/topojson/utils.py#L478](https://github.com/mattijn/topojson/blob/master/topojson/utils.py#L478)

mattijn avatar Apr 07 '22 20:04 mattijn

Simplified example to continue testing:

from shapely import geometry
import topojson as tp 

mp = geometry.shape({
    "type": "MultiPolygon",
    "coordinates": [
        [
            [[0, 0], [20, 0], [10, 20], [0, 0]],  # CCW
            [[3, 2], [10, 16], [17, 2], [3, 2]],  # CW
        ],
        [[[6, 4], [14, 4], [10, 12], [6, 4]]],  # CCW
    ]
})

topo = tp.Topology(mp, prequantize=False)
topo.to_alt(color='type:N')  # this goes well
image
topo.to_gdf().plot(alpha=0.5) # this not
image
topo.to_gdf().iloc[0].geometry.wkt  # newly introduced ')' and '('s
'MULTIPOLYGON (((0 0, 20 0, 10 20, 0 0)), ((3 2, 17 2, 10 16, 3 2)), ((6 4, 14 4, 10 12, 6 4)))'
mp.wkt  # original wkt
'MULTIPOLYGON (((0 0, 20 0, 10 20, 0 0), (3 2, 10 16, 17 2, 3 2)), ((6 4, 14 4, 10 12, 6 4)))'

mattijn avatar Apr 11 '22 22:04 mattijn

I've added a test here and have been trying to understand what happens or what should happen around here in the Hashmap class:

arcs = []
for geom in geoms:
    arcs_in_geom = self._data[bk_objects][geom]  # bk_objects = "bookkeeping_geoms"
    for idx_arc, arc_ref in enumerate(arcs_in_geom):
        arc_ids = self._data[bk_element][arc_ref]  # bk_element = "bookkeeping_arcs"
        if len(arc_ids) > 1 and key != "coordinates":
            self._inner = True if idx_arc > 0 else False
            arc_ids = self._backward_arcs(arc_ids)

        arcs.append(arc_ids)
return arcs

After Dedup the object looks as follow:

from topojson.core.dedup import Dedup
from topojson.core.hashmap import Hashmap
Dedup(
{'bbox': (0.0, 0.0, 20.0, 20.0),
 'bookkeeping_arcs': [[0], [1], [2]], <<<<<<<<<
 'bookkeeping_coords': [],
 'bookkeeping_duplicates': [],
 'bookkeeping_geoms': [[0, 1], [2]],  <<<<<<<<<
 'bookkeeping_shared_arcs': [],
 'coordinates': [],
 'junctions': [<shapely.geometry.point.Point object at 0x7fb94fe7e520>,
               <shapely.geometry.point.Point object at 0x7fb953843df0>,
               <shapely.geometry.point.Point object at 0x7fb953843e80>],
 'linestrings': [array([[ 0.,  0.],
       [20.,  0.],
       [10., 20.],
       [ 0.,  0.]]),
                 array([[ 3.,  2.],
       [10., 16.],
       [17.,  2.],
       [ 3.,  2.]]),
                 array([[ 6.,  4.],
       [14.,  4.],
       [10., 12.],
       [ 6.,  4.]])],
 'objects': {0: {'arcs': [0, 1], 'type': 'MultiPolygon'}},
 'type': 'Topology'}
)

This still seems to be okay. bookkeeping_geoms is [[0, 1], [2]], so len() is two and bookkeeping_arcs is [[0], [1], [2]] meaning that the linestrings do not contain shared segments. But after Hashmap it looks as follow:

Hashmap(mp)
Hashmap(
{'bbox': (0.0, 0.0, 20.0, 20.0),
 'coordinates': [],
 'linestrings': [array([[ 0.,  0.],
       [20.,  0.],
       [10., 20.],
       [ 0.,  0.]]),
                 array([[ 3.,  2.],
       [10., 16.],
       [17.,  2.],
       [ 3.,  2.]]),
                 array([[ 6.,  4.],
       [14.,  4.],
       [10., 12.],
       [ 6.,  4.]])],
 'objects': {'data': {'geometries': [{'arcs': [[[0]], [[1]], [[2]]],  <<<<<<<<<
                                      'type': 'MultiPolygon'}],
                      'type': 'GeometryCollection'}},
 'type': 'Topology'}
)

len of the arcs should stay 2, but has become 3. And the arc does not reflect the geom anymore. The nested hole has become a unique polygon in the multipolygon instead of a hole in the first polygon...

It is a bit complicated, help welcome 🙏

mattijn avatar May 25 '22 22:05 mattijn

This issue is fixed by PR #168 🥳. Result will become:

This took quite some time unfortunately.

mattijn avatar Aug 13 '22 20:08 mattijn

Wow, thank you so much for your work. This is excellent! Many thanks!

Dave-Evans avatar Aug 14 '22 00:08 Dave-Evans