[WIP] Add support for creating WKB/WKT attributes
Features:
- Add WKB/WKT as possible attribute types
Fixes:
- Fix getting fill value for BLOB, WKB, and WKT types
- closes https://github.com/TileDB-Inc/TileDB-Py/pull/1912
- [sc-41135]
I've been testing locally and this works great. This unblocks a whole bunch of testing I need to do! Thanks @jp-dark
I'm using the code below to create a geometry array compatible with GDAL/OGR metadata conventions. This is probably too application-specific for this repo's test suite but I'll leave it here for reference.
import os
import numpy as np
from shapely import box
import tiledb
path = "/tmp/geom_array"
schema = tiledb.ArraySchema(
domain=tiledb.Domain(
# Globe in Longitude, Latitude coordinates, 1 degree square tiles
tiledb.Dim(name="_X", domain=(-180.0, 180.0), tile=1.0, dtype=np.float64),
tiledb.Dim(name="_Y", domain=(-90.0, 90.0), tile=1.0, dtype=np.float64),
),
attrs=[
tiledb.Attr(name="fid", dtype=np.int64),
# GDAL currently only supports blob
# tiledb.Attr(name="wkb_geometry", dtype="blob"),
tiledb.Attr(name="wkb_geometry", dtype="wkb"),
],
allows_duplicates=True,
cell_order="row-major",
tile_order="row-major",
capacity=100_000,
sparse=True,
)
geoms = [
box(2, 2, 4, 4),
box(1, 1, 3, 3),
box(0, 0, 2, 2),
]
wkbs = [g.wkb for g in geoms]
bboxes = [g.bounds for g in geoms]
fids = np.array([i for i, _ in enumerate(geoms)])
n = len(geoms)
# use the centroid of the bounding box as dimensions
xs = [(b[0] + b[2]) / 2 for b in bboxes]
ys = [(b[1] + b[3]) / 2 for b in bboxes]
# use half the max dimensions for padding
max_width = max(b[2] - b[0] for b in bboxes)
max_height = max(b[3] - b[1] for b in bboxes)
padx = max_width / 2
pady = max_height / 2
tiledb.Array.create(path, schema)
# Write OGR Metadata
with tiledb.open(path, "w") as arr:
# arr.meta["CRS"] = CRS
arr.meta["GeometryType"] = "Polygon"
arr.meta["GEOMETRY_ATTRIBUTE_NAME"] = "wkb_geometry"
arr.meta["LAYER_EXTENT_MINX"] = -180.0
arr.meta["LAYER_EXTENT_MINY"] = -90.0
arr.meta["LAYER_EXTENT_MAXX"] = 180.0
arr.meta["LAYER_EXTENT_MAXY"] = 90.0
arr.meta["PAD_X"] = padx
arr.meta["PAD_Y"] = pady
arr.meta["FEATURE_COUNT"] = n
# Write attributes themselves
with tiledb.open(path, mode="w") as arr:
data = {
"fid": fids,
"wkb_geometry": np.array(
wkbs,
dtype="O",
),
}
arr[xs, ys] = data
Using the blob type results in a valid geometry array according to ogrinfo. Using the wkb type does not, yet.
$ ogrinfo -al /tmp/geom_array
INFO: Open of `/tmp/geom_array'
using driver `TileDB' successful.
Layer name: geom_array
Geometry: Polygon
Feature Count: 3
Extent: (-180.000000, -90.000000) - (180.000000, 90.000000)
fid: Integer64 (0.0) NOT NULL
wkb_geometry: String (0.0) NOT NULL
ERROR 1: std::get: wrong index for variant
I'll start looking into the GDAL code to see what it will take to support reading/writing the GEOM_WKB type instead of BLOB - It should be doable in a backwards compatible way.
Note: This PR requires an update to how WKT is handled (should support strings -> binary blob) and tests for round tripping data for both WKT and WKB.
In particular, is there anything we should re-test here, and do you know if we need to build ogrinfo from source to be able to repro that?
Those test were all @perrygeo. I haven't been directly involved in any of the GDAL support or testing. Maybe @rouault can answer about the orginfo changes?
Maybe @rouault can answer about the orginfo changes?
not sure what the question is :-) I've a queued GDAL pull request in https://github.com/OSGeo/gdal/pull/9726 that will use GEOM_WKB when creating TileDB vector datasets with TileDB >= 2.21. This is independent from TileDB-Py support for GEOM_WKB. I guess that you're unit tests that read/write GEOM_WKB attributes should be more than enough to ensure interoperability with datasets created by GDAL. Or just try the reproducer of https://github.com/TileDB-Inc/TileDB-Py/issues/1956
It is not planned that the GDAL TileDB driver will handle GEOM_WKT for now, as this is a rather inefficient way of storing geometries.
But I agree with @jp-dark that on the Python side, I would expect a GEOM_WKT to be dealt as a Python string rather than a blob type