iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

discuss: re-export arrow types

Open xxchan opened this issue 1 year ago • 4 comments

Hi, I propose to re-export arrow types in iceberg. Want to hear your opinions @liurenjie1024 @Xuanwo

Rationale

When a crate uses a dependency's type in the public API, it would be better to re-export the dependency.

The largest benefit is ease of handling multiple versions:

[dependencies]
# arrow used in the project
arrow = "50"

# which depends on arrow 52
iceberg = "0.3" 

In this case, if we want to pass arrow_array::RecordBatch to iceberg, we have to do something like this:

[dependencies]
# arrow used in the project
arrow-array = "50"

# which depends on arrow 52
iceberg = "0.3" 
arrow-array-for-iceberg = { package="arrow", version="52"}

and then pass arrow_array_for_arrow_udf::RecordBatch to iceberg. If iceberg re-export it like iceberg::arrow::RecordBatch, then we don't need to do so.

Note: This assumes that multiple versions are to some extent unavoidable. It won't help if we want to pass arrow 50 data to arrow 52 data. But it's more like some part of the code only want to pass data to iceberg without affecting other parts of the code that uses arrow 50.

Finally, I don't think there are any drawbacks to re-export arrow in iceberg.

More discussion about this problem:

xxchan avatar Sep 19 '24 09:09 xxchan

Example about this practice: https://docs.rs/deltalake/0.20.0/deltalake/arrow/index.html

https://github.com/delta-io/delta-rs/blob/b3b23be91b8eb8207473785d4b0c20c5d2c174b2/crates/core/src/lib.rs#L101-L102

xxchan avatar Sep 19 '24 09:09 xxchan

+1 for this, which helps to decipher in compatible types with same name. cc @Xuanwo What do you think?

liurenjie1024 avatar Sep 27 '24 02:09 liurenjie1024

-0 on this.

I'm not in favor of re-exporting other crates based on my own experience. However, as @xxchan mentioned, there aren't many drawbacks, so I'm okay if someone wants to add it.

Xuanwo avatar Sep 28 '24 06:09 Xuanwo

One case I think this helps is that when another project uses both arrow and iceberg, but with different versions, just as the case in the issue. When you pass them as arguments, you may see quite confusing compiler error like arrow::RecordBatch is not arrow::RecordBatch, since rust treat types from diffrent crate versions as different types. A more detailed example could be found here. Re-exporting makes things easier to understand.

liurenjie1024 avatar Sep 29 '24 01:09 liurenjie1024

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Sep 20 '25 00:09 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Oct 05 '25 00:10 github-actions[bot]