deephaven-core
deephaven-core copied to clipboard
feat: add generic Iceberg catalog adapter creation to Java / Python
Java, connecting to a RESTCatalog using MinIO
import io.deephaven.iceberg.util.*;
properties = new HashMap<>();
properties.put("type", "rest");
properties.put("uri", "http://rest:8181");
properties.put("io-impl", "org.apache.iceberg.aws.s3.S3FileIO");
properties.put("client.region", "us-east-1");
properties.put("s3.access-key-id", "admin");
properties.put("s3.secret-access-key", "password");
properties.put("s3.endpoint", "http://minio:9000");
adapter = IcebergTools.createAdapter("generic-adapter", properties);
Python, connecting to a RESTCatalog using MinIO
from deephaven.experimental import iceberg
adapter = iceberg.adapter(name="generic-adapter", properties={
"type" : "rest",
"uri" : "http://rest:8181",
"io-impl" : "org.apache.iceberg.aws.s3.S3FileIO",
"client.region" : "us-east-1",
"s3.access-key-id" : "admin",
"s3.secret-access-key" : "password",
"s3.endpoint" : "http://minio:9000"
});
Java, connecting to AWS Glue
import io.deephaven.iceberg.util.*;
properties = new HashMap<>();
properties.put("type", "glue");
properties.put("uri", "s3://lab-warehouse/sales");
properties.put("io-impl", "org.apache.iceberg.aws.s3.S3FileIO");
adapter = IcebergTools.createAdapter("generic-adapter", properties);
Python, connecting to AWS Glue
from deephaven.experimental import iceberg
adapter = iceberg.adapter(name="generic-adapter", properties={
"type" : "glue",
"uri" : "s3://lab-warehouse/sales",
"warehouse" : "s3://lab-warehouse/sales",
"io-impl" : "org.apache.iceberg.aws.s3.S3FileIO"
});
We include the support libraries for REST and Glue catalogs but others will need the user to have the libraries in the class path. These are the catalogs that will need additional files:
- org.apache.iceberg.hive.HiveCatalog
- org.apache.iceberg.hadoop.HadoopCatalog
- org.apache.iceberg.nessie.NessieCatalog
- org.apache.iceberg.jdbc.JdbcCatalog
Looks like they infer the type if the uri property is provided. https://github.com/apache/iceberg-python/blob/pyiceberg-0.6.1/pyiceberg/catalog/init.py#L155C5-L182
if uri.startswith("http"):
return CatalogType.REST
elif uri.startswith("thrift"):
return CatalogType.HIVE
elif uri.startswith(("sqlite", "postgresql")):
return CatalogType.SQL
Labels indicate documentation is required. Issues for documentation have been opened:
Community: https://github.com/deephaven/deephaven-docs-community/issues/305