ducklake icon indicating copy to clipboard operation
ducklake copied to clipboard

Feature request: java binding

Open jayhan94 opened this issue 6 months ago • 4 comments

I want to integrate ducklake with spark, so a java binding is necessary.

jayhan94 avatar Jun 06 '25 02:06 jayhan94

See #45

Tishj avatar Jun 06 '25 07:06 Tishj

IIUC, the idea behind https://github.com/duckdb/ducklake/issues/45 is to access an embedded DuckDB via JDBC, and read data via that DuckDB:

Java Application -> DuckDB JDBC Driver -> JNI -> DuckDB (native lib) -> DuckDB's DuckLake extension -> actual data files on S3

This is probably just a demo rather than a real solution. To actually support DuckLake in Spark/Java, we need an efficient implementation to get catalog info such as file list and column types from DuckLake's database, and then reading these files using Spark executor. Similar to Iceberg or other data lake stuff.

fuyufjh avatar Jun 06 '25 07:06 fuyufjh

@jayhan94

In general, either ducklake connection property or session_init_sql_file are intended to be used with Spark JDBC.

@fuyufjh

Yes, described logic of #45 example is correct. The input on this topic and its current limitations is highly appreciated. For example should the catalog access go though Java API, or through some kind of HTTP API? Will one-time export to Iceberg catalog format help this, or live access is required?

staticlibs avatar Jun 06 '25 07:06 staticlibs

Sorry, just realized I intended to link #78, not #45 for the Spark example.

staticlibs avatar Jun 06 '25 09:06 staticlibs

Java bindings should already exist, closing this for now.

pdet avatar Dec 10 '25 23:12 pdet

Just for the record, I think this project actually covers the request for Spark integration with DuckLake - https://github.com/OleanderHQ/ducklake-spark

staticlibs avatar Dec 11 '25 00:12 staticlibs