duckdb_spatial icon indicating copy to clipboard operation
duckdb_spatial copied to clipboard

Export to GPKG much slower than expected

Open marklit opened this issue 1 year ago • 2 comments

I downloaded boundaries Shapefiles for Spain from https://www.eea.europa.eu/data-and-maps/data/eea-reference-grids-2/gis-files/spain-shapefile

$ ogrinfo -al Spain_shapefile/es_1km.shp | head -n50
INFO: Open of `/Users/mark/Downloads/Spain_shapefile/es_1km.shp'
      using driver `ESRI Shapefile' successful.

Layer name: es_1km
Geometry: Polygon
Feature Count: 1143447
Extent: (2376000.000000, 1424000.000000) - (4019000.000000, 2818000.000000)
Layer SRS WKT:
PROJCRS["ETRS89 / ETRS-LAEA",
    BASEGEOGCRS["ETRS89",
        DATUM["European Terrestrial Reference System 1989",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4258]],
    CONVERSION["unnamed",
        METHOD["Lambert Azimuthal Equal Area",
            ID["EPSG",9820]],
        PARAMETER["Latitude of natural origin",52,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",10,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["False easting",4321000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",3210000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["x",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["y",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    ID["EPSG",3035]]
Data axis to CRS axis mapping: 1,2
CELLCODE: String (14.0)
EOFORIGIN: Integer64 (11.0)
NOFORIGIN: Integer64 (11.0)
OGRFeature(es_1km):0
  CELLCODE (String) = 1kmE2376N2408
  EOFORIGIN (Integer64) = 2376000
  NOFORIGIN (Integer64) = 2408000
  POLYGON ((2376000 2408000,2376000 2409000,2377000 2409000,2377000 2408000,2376000 2408000))

The ~150 MB of data loads in and is transformed a few seconds.

$ /Volumes/Seagate/duckdb_spatial/build/release/duckdb -unsigned spain.duckdb
v0.7.1 b00b93f0b1
CREATE OR REPLACE TABLE spain_boundaries AS
    SELECT * EXCLUDE(wkb_geometry),
           ST_TRANSFORM(ST_GEOMFROMWKB(wkb_geometry),
                        'EPSG:3035',
                        'EPSG:4326') geom
    FROM ST_READ('Spain_shapefile/es_1km.shp');

INSERT INTO spain_boundaries
    SELECT * EXCLUDE(wkb_geometry),
           ST_TRANSFORM(ST_GEOMFROMWKB(wkb_geometry),
                        'EPSG:3035',
                        'EPSG:4326') geom
    FROM ST_READ('Spain_shapefile/ic_1km.shp');

But exporting this data back out as GPKG takes over an hour.

COPY (SELECT ST_AsWKB(geom) geom
      FROM spain_boundaries)
TO 'spain_boundaries.gpkg'
WITH (FORMAT GDAL,
      DRIVER 'GPKG',
      LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');

Normally just using ogr2ogr the above would finish in seconds.

And for some reason, the above geometry maps over Madagascar instead of Spain.

Screenshot 2023-04-11 at 13 06 59

marklit avatar Apr 11 '23 09:04 marklit