Feature request: java binding
I want to integrate ducklake with spark, so a java binding is necessary.
See #45
IIUC, the idea behind https://github.com/duckdb/ducklake/issues/45 is to access an embedded DuckDB via JDBC, and read data via that DuckDB:
Java Application -> DuckDB JDBC Driver -> JNI -> DuckDB (native lib) -> DuckDB's DuckLake extension -> actual data files on S3
This is probably just a demo rather than a real solution. To actually support DuckLake in Spark/Java, we need an efficient implementation to get catalog info such as file list and column types from DuckLake's database, and then reading these files using Spark executor. Similar to Iceberg or other data lake stuff.
@jayhan94
In general, either ducklake connection property or session_init_sql_file are intended to be used with Spark JDBC.
@fuyufjh
Yes, described logic of #45 example is correct. The input on this topic and its current limitations is highly appreciated. For example should the catalog access go though Java API, or through some kind of HTTP API? Will one-time export to Iceberg catalog format help this, or live access is required?
Sorry, just realized I intended to link #78, not #45 for the Spark example.
Java bindings should already exist, closing this for now.
Just for the record, I think this project actually covers the request for Spark integration with DuckLake - https://github.com/OleanderHQ/ducklake-spark