topojson icon indicating copy to clipboard operation
topojson copied to clipboard

Can Numba or Jax be used to speedup processes

Open mattijn opened this issue 3 years ago • 4 comments

As raised in https://github.com/mattijn/topojson/issues/110#issuecomment-829226754, it might be good to study if Numba can be optionally used to speed up processes within computations.

mattijn avatar Apr 29 '21 20:04 mattijn

Hi! Regarding speedups, correct me if I'm wrong. Great speed-up might be achieved by parallelization of this loop: https://github.com/mattijn/topojson/blob/master/topojson/ops.py#L627

brotherofken avatar Sep 16 '21 11:09 brotherofken

Thanks for commenting! This might be one of the locations that can be parallelized if simplification is a bottleneck for you. In my experience the computation of the Topology itself is sometimes more troublesome.

The line you point to is for shapely simplification, it is also possible to use the optional package simplification, see https://mattijn.github.io/topojson/example/settings-tuning.html#simplify_algorithm.

Although that is also a for-loop: https://github.com/mattijn/topojson/blob/master/topojson/ops.py#L635.

Another option to speedup pre/topo-simplification is too see how shapely 2.0 (pygeos) will perform. If there is support for ragged-arrays than the simplification of linestrings/arcs can probably be done on array level in C.

mattijn avatar Sep 17 '21 15:09 mattijn

Comparison against https://strk.kbt.io/blog/2012/04/13/simplifying-a-map-layer-using-postgis-topology/ Simplifying a map layer using PostGIS topology Posted on 13 April, 2012

Total seconds ~13, vertices down to 1369 (from 47036).


Python topojson timing:

Total milliseconds ~457, vertices down to 1015. ~28x faster. Maybe due to better hardware.


Used code

import topojson as tp
import geopandas as gpd
import numpy as np

gdf = gpd.read_file(r"https://www.geotests.net/cours/sigma/webmapping/donnees_pg/GEOFLADept_FR_Corse_AV_L93.zip")
gdf.head()

# building topology
topo = tp.Topology(gdf, prequantize=3000)  # 270ms

# apply simplify on arcs
topo = topo.toposimplify(10000)  # 164ms

# visualize
topo.to_alt()  # 94ms

# or as oneliner
# tp.Topology(gdf, prequantize=10000, toposimplify=10000).to_alt()  # 457ms

image

# count vertices
arcs = topo.to_dict()['arcs']
np.sum([len(arc) for arc in arcs])
1015

mattijn avatar Jun 22 '22 12:06 mattijn

Or Jax

mattijn avatar Sep 08 '22 14:09 mattijn