deephaven-core icon indicating copy to clipboard operation
deephaven-core copied to clipboard

feat: add generic Iceberg catalog adapter creation to Java / Python

Open lbooker42 opened this issue 1 year ago • 2 comments

Java, connecting to a RESTCatalog using MinIO

import io.deephaven.iceberg.util.*;

properties = new HashMap<>();
properties.put("type", "rest");
properties.put("uri", "http://rest:8181");
properties.put("io-impl", "org.apache.iceberg.aws.s3.S3FileIO");

properties.put("client.region", "us-east-1");

properties.put("s3.access-key-id", "admin");
properties.put("s3.secret-access-key", "password");
properties.put("s3.endpoint", "http://minio:9000");

adapter = IcebergTools.createAdapter("generic-adapter", properties);

Python, connecting to a RESTCatalog using MinIO

from deephaven.experimental import iceberg

adapter = iceberg.adapter(name="generic-adapter", properties={
    "type" : "rest",
    "uri" : "http://rest:8181",
    "io-impl" : "org.apache.iceberg.aws.s3.S3FileIO",
    "client.region" : "us-east-1",
    "s3.access-key-id" : "admin",
    "s3.secret-access-key" : "password",
    "s3.endpoint" : "http://minio:9000"
});

Java, connecting to AWS Glue

import io.deephaven.iceberg.util.*;

properties = new HashMap<>();
properties.put("type", "glue");
properties.put("uri", "s3://lab-warehouse/sales");
properties.put("io-impl", "org.apache.iceberg.aws.s3.S3FileIO");

adapter = IcebergTools.createAdapter("generic-adapter", properties);

Python, connecting to AWS Glue

from deephaven.experimental import iceberg

adapter = iceberg.adapter(name="generic-adapter", properties={
    "type" : "glue",
    "uri" : "s3://lab-warehouse/sales",
    "warehouse" : "s3://lab-warehouse/sales",
    "io-impl" : "org.apache.iceberg.aws.s3.S3FileIO"
});

lbooker42 avatar Jul 10 '24 23:07 lbooker42

We include the support libraries for REST and Glue catalogs but others will need the user to have the libraries in the class path. These are the catalogs that will need additional files:

  • org.apache.iceberg.hive.HiveCatalog
  • org.apache.iceberg.hadoop.HadoopCatalog
  • org.apache.iceberg.nessie.NessieCatalog
  • org.apache.iceberg.jdbc.JdbcCatalog

lbooker42 avatar Jul 11 '24 00:07 lbooker42

Looks like they infer the type if the uri property is provided. https://github.com/apache/iceberg-python/blob/pyiceberg-0.6.1/pyiceberg/catalog/init.py#L155C5-L182

            if uri.startswith("http"):
                return CatalogType.REST
            elif uri.startswith("thrift"):
                return CatalogType.HIVE
            elif uri.startswith(("sqlite", "postgresql")):
                return CatalogType.SQL

devinrsmith avatar Jul 11 '24 16:07 devinrsmith

Labels indicate documentation is required. Issues for documentation have been opened:

Community: https://github.com/deephaven/deephaven-docs-community/issues/305

deephaven-internal avatar Sep 06 '24 21:09 deephaven-internal