topojson icon indicating copy to clipboard operation
topojson copied to clipboard

Excessive memory usage with prequantization enabled

Open Zaczero opened this issue 1 year ago • 5 comments

I am primarily posting this issue for future people facing a similar problem.

In my case, when the prequantize option is enabled (which is the default setting), the toposimplify method consumes 25GB of memory. However, when I disable the prequantize option, memory usage peaks at just 5GB. I utilize shapely for simplification.


Reproduction steps

  1. Download both parts of the archive: countries1.zip countries2.zip

  2. Combine the archives:

cat countries1.zip countries2.zip > countries.zip
  1. Unzip it.

  2. Execute the following Python code snippet:

with open('countries.geojson', 'rb') as f:
    features = json.load(f)['features']
countries_geoms = [shape(f['geometry']) for f in features]
topo = tp.Topology(countries_geoms)
topo.toposimplify(0.00001, inplace=True)
  1. Monitor memory usage.

  2. To resolve the issue, replace topo with:

topo = tp.Topology(countries_geoms, prequantize=False)

By the way, should prequantization be enabled by default? I personally find it odd that the library performs certain calculations by default, even if they don't apply to my use case and don't provide any benefit. I can only understand such default behavior if it benefits everyone. Otherwise, this should be an opt-in operation (the same as simplification is opt-in).

Zaczero avatar Nov 30 '23 23:11 Zaczero

Thank you for raising the issue and it is great to see you find this package useful for your need! Until now, speed has been the main bottleneck, but if we can reduce the memory footprint, that would be great too. It's worth to profile the code to find the main culprit that is causing the memory to blow up.

mattijn avatar Dec 02 '23 10:12 mattijn

🙂! If you are interested, I use this package to run https://github.com/Zaczero/osm-countries-geojson. It finally resolved the issue with overlaps/gaps produced during the simplification process. And now it's perfect!

Zaczero avatar Dec 02 '23 11:12 Zaczero

Thanks for showing your package! May I ask how the directed graph of networkx is being utilised for your use-case? That seems interesting!

I was looking to your referenced geojson and noticed at least two things that you might check.

  • it seems there is a (part of a) country missing near Morocco: image

  • something is doing odd in the south of the Netherlands: image

Again, thanks for reaching out!

mattijn avatar Dec 02 '23 23:12 mattijn

  1. This is simply the nature of OSM data. In regions of conflict, it's common to encounter such situations. Sometimes, you might even come across two countries at the same time: 2023-12-03_00-45-33

  2. This appears to be a bug with GitHub's GeoJSON visualizer. They seem to apply their own simplification for rendering. Here's how this location appears on OSM: image And this is how it looks when rendered locally (which is acceptable for such a high level of simplification): image

I understand that the documentation for the countries generator is lacking. Essentially, the directed graph is utilized to reconstruct country polygons efficiently from split and randomly ordered line segments. OSM data does not store countries in predefined shapes but rather as a collection of lines. The directed graph (compared to undirected) improves performance by reducing the number of paths simple cycles has to traverse. Each node represents an intersection (lines endpoints), and each edge represents a line segment.

image

Zaczero avatar Dec 02 '23 23:12 Zaczero

Interesting! Halfway in the computation of a topology the line segments are also split where the order is not always clear. In the hashmap-step I use a _hash_order() to determine the order. Maybe I could have used a directed graph there as well. Regarding 1), I can understand the claim of a single place by multiple countries, but I didn't expect a place not being claimed by any country. Regarding 2), the OSM location seems to be OK, the border is a bit messy there. Maybe it's a glitch when zooming out.

mattijn avatar Dec 03 '23 00:12 mattijn