duckdb_spatial
duckdb_spatial copied to clipboard
Performance improvements for Geodatabase imports?
I used the latest master branch to convert a 21 GB Geodatabase fileset into Parquet with some light enrichment. It took almost 7 hours exactly on a system with 64 cores and 64 GB of RAM. This works out to a read speed ~871 KB/s. Is there much that could be done to optimise for this format? Most datasets I process for clients are in this format.
LOAD parquet;
COPY (SELECT * EXCLUDE(GEOMETRY_BIN),
printf('%x',
h3_latlng_to_cell(
ST_Y(ST_CENTROID(GEOMETRY_BIN::GEOMETRY)),
ST_X(ST_CENTROID(GEOMETRY_BIN::GEOMETRY)),
7)::bigint) as h3_7,
printf('%x',
h3_latlng_to_cell(
ST_Y(ST_CENTROID(GEOMETRY_BIN::GEOMETRY)),
ST_X(ST_CENTROID(GEOMETRY_BIN::GEOMETRY)),
8)::bigint) as h3_8,
printf('%x',
h3_latlng_to_cell(
ST_Y(ST_CENTROID(GEOMETRY_BIN::GEOMETRY)),
ST_X(ST_CENTROID(GEOMETRY_BIN::GEOMETRY)),
9)::bigint) as h3_9,
ST_AsHEXWKB(GEOMETRY_BIN::GEOMETRY)::TEXT AS geom
FROM st_read('test.gdb/a00000011.gdbtable'))
TO 'test.gdb/a00000011.pq' (FORMAT 'PARQUET',
CODEC 'Snappy');