TileDB-Py icon indicating copy to clipboard operation
TileDB-Py copied to clipboard

[WIP] Add support for creating WKB/WKT attributes

Open jp-dark opened this issue 1 year ago • 4 comments

Features:

  • Add WKB/WKT as possible attribute types

Fixes:

  • Fix getting fill value for BLOB, WKB, and WKT types

  • closes https://github.com/TileDB-Inc/TileDB-Py/pull/1912
  • [sc-41135]

jp-dark avatar Mar 05 '24 15:03 jp-dark

I've been testing locally and this works great. This unblocks a whole bunch of testing I need to do! Thanks @jp-dark

I'm using the code below to create a geometry array compatible with GDAL/OGR metadata conventions. This is probably too application-specific for this repo's test suite but I'll leave it here for reference.

import os

import numpy as np
from shapely import box

import tiledb

path = "/tmp/geom_array"
schema = tiledb.ArraySchema(
    domain=tiledb.Domain(
        # Globe in Longitude, Latitude coordinates, 1 degree square tiles
        tiledb.Dim(name="_X", domain=(-180.0, 180.0), tile=1.0, dtype=np.float64),
        tiledb.Dim(name="_Y", domain=(-90.0, 90.0), tile=1.0, dtype=np.float64),
    ),
    attrs=[
        tiledb.Attr(name="fid", dtype=np.int64),
        # GDAL currently only supports blob
        # tiledb.Attr(name="wkb_geometry", dtype="blob"),
        tiledb.Attr(name="wkb_geometry", dtype="wkb"),
    ],
    allows_duplicates=True,
    cell_order="row-major",
    tile_order="row-major",
    capacity=100_000,
    sparse=True,
)

geoms = [
    box(2, 2, 4, 4),
    box(1, 1, 3, 3),
    box(0, 0, 2, 2),
]
wkbs = [g.wkb for g in geoms]
bboxes = [g.bounds for g in geoms]
fids = np.array([i for i, _ in enumerate(geoms)])
n = len(geoms)

# use the centroid of the bounding box as dimensions
xs = [(b[0] + b[2]) / 2 for b in bboxes]
ys = [(b[1] + b[3]) / 2 for b in bboxes]

# use half the max dimensions for padding
max_width = max(b[2] - b[0] for b in bboxes)
max_height = max(b[3] - b[1] for b in bboxes)
padx = max_width / 2
pady = max_height / 2

tiledb.Array.create(path, schema)

# Write OGR Metadata
with tiledb.open(path, "w") as arr:
    # arr.meta["CRS"] = CRS
    arr.meta["GeometryType"] = "Polygon"
    arr.meta["GEOMETRY_ATTRIBUTE_NAME"] = "wkb_geometry"
    arr.meta["LAYER_EXTENT_MINX"] = -180.0
    arr.meta["LAYER_EXTENT_MINY"] = -90.0
    arr.meta["LAYER_EXTENT_MAXX"] = 180.0
    arr.meta["LAYER_EXTENT_MAXY"] = 90.0
    arr.meta["PAD_X"] = padx
    arr.meta["PAD_Y"] = pady
    arr.meta["FEATURE_COUNT"] = n

# Write attributes themselves
with tiledb.open(path, mode="w") as arr:
    data = {
        "fid": fids,
        "wkb_geometry": np.array(
            wkbs,
            dtype="O",
        ),
    }
    arr[xs, ys] = data

Using the blob type results in a valid geometry array according to ogrinfo. Using the wkb type does not, yet.

$ ogrinfo -al /tmp/geom_array
INFO: Open of `/tmp/geom_array'
      using driver `TileDB' successful.
Layer name: geom_array
Geometry: Polygon
Feature Count: 3
Extent: (-180.000000, -90.000000) - (180.000000, 90.000000)
fid: Integer64 (0.0) NOT NULL
wkb_geometry: String (0.0) NOT NULL
ERROR 1: std::get: wrong index for variant

I'll start looking into the GDAL code to see what it will take to support reading/writing the GEOM_WKB type instead of BLOB - It should be doable in a backwards compatible way.

perrygeo avatar Mar 18 '24 18:03 perrygeo

Note: This PR requires an update to how WKT is handled (should support strings -> binary blob) and tests for round tripping data for both WKT and WKB.

jp-dark avatar Mar 25 '24 16:03 jp-dark

In particular, is there anything we should re-test here, and do you know if we need to build ogrinfo from source to be able to repro that?

Those test were all @perrygeo. I haven't been directly involved in any of the GDAL support or testing. Maybe @rouault can answer about the orginfo changes?

jp-dark avatar Apr 25 '24 14:04 jp-dark

Maybe @rouault can answer about the orginfo changes?

not sure what the question is :-) I've a queued GDAL pull request in https://github.com/OSGeo/gdal/pull/9726 that will use GEOM_WKB when creating TileDB vector datasets with TileDB >= 2.21. This is independent from TileDB-Py support for GEOM_WKB. I guess that you're unit tests that read/write GEOM_WKB attributes should be more than enough to ensure interoperability with datasets created by GDAL. Or just try the reproducer of https://github.com/TileDB-Inc/TileDB-Py/issues/1956

It is not planned that the GDAL TileDB driver will handle GEOM_WKT for now, as this is a rather inefficient way of storing geometries.

But I agree with @jp-dark that on the Python side, I would expect a GEOM_WKT to be dealt as a Python string rather than a blob type

rouault avatar Apr 25 '24 15:04 rouault