grass icon indicating copy to clipboard operation
grass copied to clipboard

[Bug] r.fill.dir Segfault/stack overflow with larger datasets

Open byteit101 opened this issue 1 year ago • 1 comments

Describe the bug

Large datasets seem to cause a segfault. Investigating the resulting coredump, the stack trace was 100k lines long:

(gdb) bt
#0  0x0000555c25ff41a6 in recurse_list (flag=flag@entry=117042, cells=cells@entry=0x7fa04fa00010, sz=sz@entry=4252788, start=<optimized out>) at ./raster/r.fill.dir/dopolys.c:26
#1  0x0000555c25ff41b5 in recurse_list (flag=flag@entry=117042, cells=cells@entry=0x7fa04fa00010, sz=sz@entry=4252788, start=<optimized out>) at ./raster/r.fill.dir/dopolys.c:26
#2  0x0000555c25ff41b5 in recurse_list (flag=flag@entry=117042, cells=cells@entry=0x7fa04fa00010, sz=sz@entry=4252788, start=<optimized out>) at ./raster/r.fill.dir/dopolys.c:26
...snip...
#104756 0x0000555c25ff41b5 in recurse_list (flag=flag@entry=117042, cells=cells@entry=0x7fa04fa00010, sz=sz@entry=4252788, start=<optimized out>) at ./raster/r.fill.dir/dopolys.c:26
#104757 0x0000555c25ff41b5 in recurse_list (flag=flag@entry=117042, cells=cells@entry=0x7fa04fa00010, sz=sz@entry=4252788, start=<optimized out>) at ./raster/r.fill.dir/dopolys.c:26
#104758 0x0000555c25ff41b5 in recurse_list (flag=flag@entry=117042, cells=cells@entry=0x7fa04fa00010, sz=sz@entry=4252788, start=<optimized out>) at ./raster/r.fill.dir/dopolys.c:26
#104759 0x0000555c25ff43d9 in dopolys (fd=fd@entry=6, fm=fm@entry=7, nl=nl@entry=11011, ns=ns@entry=14056) at ./raster/r.fill.dir/dopolys.c:81
#104760 0x0000555c25ff399c in main (argc=<optimized out>, argv=<optimized out>) at ./raster/r.fill.dir/main.c:210

Seems that https://trac.osgeo.org/grass/ticket/2742 wasn't actually fixed.

To reproduce

  1. Download a large chunk of DEM data. I used https://maps.vcgi.vermont.gov/opendata/clipandzip.html?InputLayerName=IMG_VCGI_LIDARDEM_SP_v3&InputFtype=raster with a chunk about 3/4 the size of the towns in the bottom right. Mine was about 600MB.
  2. Extract and import
  3. Run r.fill.dir with it
  4. See error

I had sufficient memory, I have about 16GB free, which wasn't used when this was running

Expected behavior

No Crash

Screenshots

Importing raster map <rast_667ee69b044a211>...
0..3..6..9..12..15..18..21..24..27..30..33..36..39..42..45..48..51..54..57..60..63..66..69..72..75..78..81..84..87..90..93..96..99..100
Reading input elevation raster map...
0..3..6..9..12..15..18..21..24..27..30..33..36..39..42..45..48..51..54..57..60..63..66..69..72..75..78..81..84..87..90..93..96..99..100
Filling sinks...
Determining flow directions for ambiguous cases...
Segmentation fault (core dumped)

System description

  • Operating System: Debian GNU/Linux 12
  • GRASS GIS version: 8.4.1 (Debian)
QGIS version 3.22.16-Białowieża - QGIS code branch - Release 3.22
Qt version 5.15.8
Python version 3.11.1
GDAL/OGR version 3.6.2
PROJ version 9.1.1
EPSG Registry database version v10.076 (2022-08-31)
GEOS version 3.11.1-CAPI-1.17.1
SQLite version 3.40.1
PostgreSQL client version 15.1 (Debian 15.1-1+b1)
SpatiaLite version 5.0.1
QWT version 6.1.4
QScintilla2 version 2.13.3
OS version Debian GNU/Linux 12 (bookworm)
   
Active Python plugins
quick_map_services 0.19.29
DEMto3D 3.6
MetaSearch 0.3.5
db_manager 0.1.20
grassprovider 2.12.99
processing 2.12.99
sagaprovider 2.12.99

byteit101 avatar Jun 28 '24 17:06 byteit101

As there is no link from the Trac to an actual commit, I'd guess it didn't got fixed. Please provide your computational region parameters (g.region -p) to determine how large is large.

marisn avatar Jun 30 '24 13:06 marisn

Hmm, I am unsure where or how to use that command, but QGIS's layer properties says:

Total size: 528.95 MB Width: 14056 Height: 11011 Data type: Float32 - Thirty two bit floating point GDAL Driver Description: HFA GDAL Driver Metadata: Erdas Imagine Images (.img) Dimensions: X: 14056 Y: 11011 Bands: 1 Pixel Size: 0.6999999999484479707,-0.6999999999484491919 Name: NAD83 / Vermont Units: meters Method: Transverse Mercator

byteit101 avatar Jul 02 '24 21:07 byteit101

I can confirm. GRASS 8.4.0, Debian Linux/12, AMD64. GTiff/GeoTIFF, 1467x2658, 1 band, Float32. Runs for about 2 days, then segfaults.

cswingle avatar Feb 26 '25 17:02 cswingle

Is it possible to have a smaller reproduction, and that it doesn't take 2 days to trigger? Does constraining the machine ressources in a VM help trigger it?

echoix avatar Feb 26 '25 17:02 echoix

I know my source (using the vermont gis export link as I describe) was fairly quick (under 5 minutes, I think)

byteit101 avatar Feb 26 '25 17:02 byteit101