rmapshaper icon indicating copy to clipboard operation
rmapshaper copied to clipboard

Processing large (~1GB) shp file with ms_simplify results in hang

Open abasees opened this issue 2 years ago • 1 comments

I'm encountering issues when running the below code for a shp file of approximately 1GB, with 242K features (US Census geography files).

boundary_simp = ms_simplify(boundary_in, method="dp", keep = ratio, keep_shapes=TRUE, sys=TRUE)

When running the above code for the file in question, the process continues for a couple of minutes before exploding on RAM usage (it attempts to use all 256GB of RAM). After some time this is eventually reduced and the process looks to more or less hang for hours (although it does not look it is 100% idle as CPU & RAM utilisation is changing minorly). No error message (or message on the progress i.e. allocation of heap memory etc. that would normally occur) appears to help diagnose this further.

Given the above, I performed a test on a file 1/5 the size, which successfully completes within 3 or so minutes which may indicate there is a bottleneck in the process, likely in the conversion to the GeoJSON format (based on past issues flagged similarly, this looks to be a potential cause).

Is there a work around solution (for example supplying GeoJSON format directly rather than shp file) or otherwise for this type of problem?

abasees avatar Mar 02 '22 07:03 abasees

Hi @abasees - a couple of things to try... did you try to increase sys_mem in the ms_simplify() call? The default is 8gb, but you can specify up to your available system memory (note this only works when sys = TRUE, as you have here).

You could also try it using the mapshaper command-line tool directly with something like:

mapshaper my_file.shp -simplify 0.1 dp keep-shapes -o simplified.shp

Other than those suggestions, I'd need a minimal reproducible example to see where else things could be helped.

(As an aside, unless you have a specific need for it, the default visvalingam simplification method is usually much better than the Douglas-Peuker (dp) method: https://github.com/mbloch/mapshaper/wiki/Simplification-Tips)

ateucher avatar Mar 02 '22 23:03 ateucher