iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

Rust <> Python integration point

Open kevinjqliu opened this issue 1 year ago • 6 comments

After establishing #518, I want to start the conversation to create the first integration between PyIceberg and iceberg-rust. As discussed in the dev list, we want to create an integration based on pluggable FileIO.

I'm wondering if there's also a way to create an integration for a pluggable catalog, based on the in-memory catalog implementation in #475.

I'm not familiar with the rust ecosystem, so would appreciate any pointers

kevinjqliu avatar Aug 11 '24 21:08 kevinjqliu

I'm wondering if there's also a way to create an integration for a pluggable catalog, based on the in-memory catalog implementation in #475.

I believe this should also be possible. So, the pyiceberg community wants to have an in-memory catalog based on iceberg-rust. Does pyiceberg provide an interface that we can integrate with?

The in-memory catalog depends on FileIO, so we might need to build FileIO first. However, it also makes sense to expose a purely in-memory catalog (memory FileIO and memory catalog) to pyiceberg initially.

Xuanwo avatar Aug 14 '24 10:08 Xuanwo

I think it's definitely possible since PyIceberg is Catalog interface is extensible. I think you need to start with pyo3 first to understand how it works.

liurenjie1024 avatar Aug 14 '24 14:08 liurenjie1024

Does pyiceberg provide an interface that we can integrate with?

Yes, there is a py-catalog-impl configuration that will try to load a given classpath. (documentation, implementation, test)

The in-memory catalog depends on FileIO, so we might need to build FileIO first. However, it also makes sense to expose a purely in-memory catalog (memory FileIO and memory catalog) to pyiceberg initially.

I'm bringing up this issue because I want the simplest way to integrate iceberg-python and iceberg-rust. If FileIO integration is a prerequisite, we can start there instead.

kevinjqliu avatar Aug 15 '24 00:08 kevinjqliu

Hi, @kevinjqliu, I'm sorry for blocking your innovation this way.

I've been a bit busy recently, but I plan to create something that really works next week. For instance, reading data from PyIceberg using pyiceberg-core. This will enable our community to build more cool things based on that.

Xuanwo avatar Aug 16 '24 07:08 Xuanwo

@Xuanwo very cool! looking forward to it.

kevinjqliu avatar Aug 16 '24 10:08 kevinjqliu

Looks like @sungwy already started by exposing Transforms in #556

I'll take a stab at exposing the Catalogs, see https://github.com/apache/iceberg-rust/pull/534#issuecomment-2330489500

kevinjqliu avatar Sep 05 '24 17:09 kevinjqliu

Closing this! We have a number of integrations already

kevinjqliu avatar Sep 09 '25 05:09 kevinjqliu