iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

Tracking issues of aligning storage support with iceberg-java

Open Xuanwo opened this issue 1 year ago • 11 comments

iceberg-java now supports

  • [x] aws s3
  • [ ] aliyun oss
  • [ ] azure adlsv2
  • [ ] gcp gcs
  • [ ] hadoop hdfs

Although OpenDAL supports more storage services than this, it still makes sense to at least support all existing storage services. This issue will track the progress.


After this been implemented, iceberg-rust will have the same storage support level as iceberg-java. I'm willing implement those features but also open to help review related changes. Please comment if you want to join the development and pick up one of them.

Xuanwo avatar Jun 19 '24 10:06 Xuanwo

cc @liurenjie1024, I'm not sure about your 0.3 release plan going. Maybe we can include this one inside?

Most changes should be easy and no API changes. It's also fine to be included in the following 0.3.x releases.

Xuanwo avatar Jun 19 '24 10:06 Xuanwo

Hi, @Xuanwo There are two places to track 0.3 features:

  1. https://github.com/apache/iceberg-rust/issues/348
  2. https://github.com/apache/iceberg-rust/milestone/2

I'm ok with waiting for adding this into 0.3 release. I'm just curious how to test against these? Or maybe we can start with declaring these features as experimental.

liurenjie1024 avatar Jun 19 '24 13:06 liurenjie1024

I'm just curious how to test against these? Or maybe we can start with declaring these features as experimental.

  • [x] aws s3: tested by minio now. We can add real s3 bucket in with sponsor.
  • [ ] aliyun oss: need an oss bucket (better to locate near us-east-1)
  • [ ] azure adlsv2: can be tested by Azurite. And I'm willing to provide test infra as Microsoft MVP.
  • [ ] gcp gcs: need a gcs bucket (better to locate near us-east-1)
  • [ ] hadoop hdfs: can setup in CI directly (thanks open source!)

I agree that we can label these features as experimental. Setting up the CI infrastructure requires time, more so than implementing those features.

Xuanwo avatar Jun 19 '24 13:06 Xuanwo

I have a question on the current FileIO - @Xuanwo is probably the right person to ask here.

It would be useful to be able to customize the OpenDAL Operator by being able to attach layers. Could we extend expose this capability somewhere? I've more than happy to work on this.

sdd avatar Jun 19 '24 13:06 sdd

I'm just curious how to test against these? Or maybe we can start with declaring these features as experimental.

  • [x] aws s3: tested by minio now. We can add real s3 bucket in with sponsor.
  • [ ] aliyun oss: need an oss bucket (better to locate near us-east-1)
  • [ ] azure adlsv2: can be tested by Azurite. And I'm willing to provide test infra as Microsoft MVP.
  • [ ] gcp gcs: need a gcs bucket (better to locate near us-east-1)
  • [ ] hadoop hdfs: can setup in CI directly (thanks open source!)

I agree that we can label these features as experimental. Setting up the CI infrastructure requires time, more so than implementing those features.

Cool, let's move!

liurenjie1024 avatar Jun 19 '24 13:06 liurenjie1024

It would be useful to be able to customize the OpenDAL Operator by being able to attach layers. Could we extend expose this capability somewhere? I've more than happy to work on this.

Any detailed ideas? Are you talking about enabling some existing layers for opendal or allow users to implement something new based on FileIO?

I can imagine that enabling logging and retry layers by default or by configuring might be useful.

Xuanwo avatar Jun 19 '24 13:06 Xuanwo

I agree that we can label these features as experimental. Setting up the CI infrastructure requires time, more so than implementing those features.

Split into a new issue: https://github.com/apache/iceberg-rust/issues/410.

I plan to track them after 0.3 release.

Xuanwo avatar Jun 19 '24 13:06 Xuanwo

@Xuanwo I can take the Azure datalake FileIO Implementation + the corresponding infrastructure set up, sound ok?

jsimbadev avatar Jul 04 '24 19:07 jsimbadev

@Xuanwo I can take the Azure datalake FileIO Implementation + the corresponding infrastructure set up, sound ok?

Welcome, have fun!

Xuanwo avatar Jul 05 '24 02:07 Xuanwo

cc @Xuanwo Do you still plan to finish this before in 0.3.0? Or we can postpone it to next release?

liurenjie1024 avatar Aug 06 '24 02:08 liurenjie1024

cc @Xuanwo Do you still plan to finish this before in 0.3.0? Or we can postpone it to next release?

There are some more work to do at opendal side. I believe we can let 0.3.0 go first.

Xuanwo avatar Aug 06 '24 02:08 Xuanwo

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Oct 09 '25 00:10 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Oct 24 '25 00:10 github-actions[bot]