duckdb_spatial icon indicating copy to clipboard operation
duckdb_spatial copied to clipboard

pure virtual method called

Open marklit opened this issue 1 year ago • 4 comments

This is running on Arch Linux on my Steam Deck. It has 16 GB of RAM and there is ~200 GB of free space atm. htop reports ~4.2% RES MEM usage while running.

$ aws s3 --no-sign-request cp s3://naturalearth/10m_cultural/ne_10m_admin_0_countries.zip ./
$ unzip ne_10m_admin_0_countries.zip
$ wc -l h3_5s.txt # 14,290 records

$ python3 -m venv ~/.osm
$ source ~/.osm/bin/activate
$ pip install duckdb h3 rich # DuckDB 0.10.0

$ python3 # 3.11
import duckdb
import h3
from rich.progress import track


def get_iso3s():
    lookups = {x: h3.h3_to_geo(x)
               for x in open('h3_5s.txt').read().strip().splitlines() 
               if len(x) > 14}

    con = duckdb.connect(database=":memory:")
    con.execute('INSTALL spatial')
    con.execute('LOAD spatial')

    out = {}

    sql_iso = """select ISO_A3 
                 from st_read('ne_10m_admin_0_countries.shx') 
                 where st_contains(geom, st_point(?, ?)) 
                 limit 1;"""

    for h3_5, (lat, lon) in track(lookups.items()):
        iso3 = con.sql(sql_iso, params=(lon, lat)).fetchall()

        if iso3:
            out[h3_5] = iso3[0][0]

    return out


iso3s = get_iso3s()

I get the following returned around ~80% of the way through the ~14K lookups. That 80% point isn't set in stone and I've run this 10-15x times now where it fails randomly and at different progress points.

pure virtual method called
terminate called without an active exception
Aborted (core dumped)

I tried converting the source data into GPKG to see if it would get around this but the resulting file raised IO Errors.

$ ogr2ogr ne_10m_admin_0_countries.gpkg -f GPKG ne_10m_admin_0_countries.shx
Warning 1: A geometry of type MULTIPOLYGON is inserted into layer ne_10m_admin_0_countries of geometry type POLYGON, which is not normally allowed by the GeoPackage specification, but the driver will however do it. To create a conformant GeoPackage, if using ogr2ogr, the -nlt option can be used to override the layer geometry type. This warning will no longer be emitted for this combination of layer and feature geometry type.
SELECT * FROM ST_READ('ne_10m_admin_0_countries.gpkg') LIMIT 1;
Error: IO Error: GDAL Error (1): too many arguments on function OGR_GPKG_FillArrowArray_INTERNAL

marklit avatar Mar 10 '24 17:03 marklit

Attempt number 20 or so managed to finish without issue. I'm not sure if there is something wrong with my system or the OS.

marklit avatar Mar 10 '24 18:03 marklit

Hi! It sounds like there's a bunch of different issues at play here. The pure virtual abort in particular seems like it would be hard to reproduce and could originate either in spatial, the python bindings or gdal. Regarding the GPKG warning/error it could just be that gdals arrow api can't handle non-conformant gpkgs. I'll try to have a look after next release.

Spatial has also been shipping an undocumented prototype native shapfile reader function, ST_ReadSHP. It doesn't handle all kinds of text encoding options and it needs a lot more testing but it could maybe act as a temporary workaround for your issue.

Maxxen avatar Mar 13 '24 11:03 Maxxen

If you have any CLI commands I could run to try and pinpoint where the problem is I can run them on my Steam Deck.

I'll have a look into that new ST command.

marklit avatar Mar 13 '24 12:03 marklit

Error: IO Error: GDAL Error (1): too many arguments on function OGR_GPKG_FillArrowArray_INTERNAL

This is https://github.com/OSGeo/gdal/issues/8757 , which was fixed in GDAL 3.8.1

rouault avatar Apr 20 '24 15:04 rouault