duckdb_spatial icon indicating copy to clipboard operation
duckdb_spatial copied to clipboard

Feature: Native WKT reader/writer

Open Maxxen opened this issue 2 years ago • 3 comments

We currently rely on GEOS for writing and reading WKT, which requires an unnecessary copy and allocation step to convert back and forth between DuckDB and GEOS geometries.

We should implement our own WKT parser/writer that can use arena allocation.

Since we also only support of a subset of WKT (no extended WKT) this would also give us more control and ensure we don't accidentally import geometries we can't support - basically throw at the parsing step.

Maxxen avatar Mar 24 '23 16:03 Maxxen

Does the above mean there is a way to convert WKT strings into a GEOMETRY or WKB_BLOB type already?

D DESCRIBE FROM read_ndjson_auto('shapes_all_low.jsonl') LIMIT 1;
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │  null   │   key   │ default │  extra  │
│   varchar   │   varchar   │ varchar │ varchar │ varchar │ varchar │
├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ geoname_id  │ UBIGINT     │ YES     │         │         │         │
│ wkt         │ VARCHAR     │ YES     │         │         │         │
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘
D select wkt::GEOMETRY FROM read_ndjson_auto('shapes_all_low.jsonl') LIMIT 1;
libc++abi: terminating with uncaught exception of type duckdb::NotImplementedException: Not implemented Error: Deserialize::Geometry type not implemented yet!
Abort trap: 6
D select st_astext(wkt) FROM read_ndjson_auto('shapes_all_low.jsonl') LIMIT 1;
Error: Binder Error: No function matches the given name and argument types 'ST_AsText(VARCHAR)'. You might need to add explicit type casts.
	Candidate functions:
	ST_AsText(POINT_2D) -> VARCHAR
	ST_AsText(LINESTRING_2D) -> VARCHAR
	ST_AsText(POLYGON_2D) -> VARCHAR
	ST_AsText(GEOMETRY) -> VARCHAR

My dataset is a mix of geometry types.

D select SPLIT_PART(wkt, ' ', 1), COUNT(*) from read_ndjson_auto('shapes_all_low.jsonl') GROUP BY 1 ORDER BY 2 DESC;
┌─────────────────────────┬──────────────┐
│ split_part(wkt, ' ', 1) │ count_star() │
│         varchar         │    int64     │
├─────────────────────────┼──────────────┤
│ MULTIPOLYGON            │          196 │
│ POLYGON                 │           53 │
└─────────────────────────┴──────────────┘

marklit avatar May 05 '23 07:05 marklit

Yes! "WKT" and "Text" is the same thing. Use ST_GeomFromText(VARCHAR). You can pass an 'ignore_invalid := bool flag too to get nulls instead of error should the geom be invalid.

Maxxen avatar May 05 '23 07:05 Maxxen

Okay, thanks.

marklit avatar May 05 '23 07:05 marklit