icelake
icelake copied to clipboard
RoadMap of IceLake v0.1
Iceberg is an open table format designed for analytic datasets. However, the lack of a mature Rust binding for Iceberg makes it difficult to integrate with databases like Databend.
IceLake intends to fill this gap. By developing icelake, I expect to build up an open ecosystem that:
- Users can read/write iceberg table from ANY storage services like s3, gcs, azblob, hdfs and so on.
- ANY Databases can integrate with
icelaketo facilitate reading and writing of iceberg tables. - Provides NATIVE support transmute between
arrow - Provides bindings so that other language can operate on iceberg tables powered by rust core.
For IceLake v0.1, I expect to implement the following features:
- Setup the project layout and build development loop so that the community can take part in.
- Support reading data for iceberg v2 from storage services (only limited file formats will be supported).
- Evaluate our design by integrating it with databend.
This project is sponsored by Databend Labs
Updates on 2023-06-17:
We have released version 0.0.1, which includes all the necessary types. Our next step is to integrate with databend to ensure we are proceeding in the right direction.
cc our sponsors, FYI: @BohuTANG, @sundy-li, @flaneur2020, @ZhiHanZ
Very excited for this!
Looking forward !
Updates on 2023-06-29:
IceLake is almost functional on Databend now: https://github.com/datafuselabs/databend/pull/11923
I am currently working on resolving some issues with reading Parquet files in Databend. However, I am confident that I can address these issues within the next two days.
Once we successfully test our initial proof of concept for reading, we will release version 0.1 and clean up our code. We will also add more documentation to enable our community to participate.
cc our sponsors, FYI: @BohuTANG, @sundy-li, @flaneur2020, @ZhiHanZ
This document is the result of my study
ABOUT icelake
a example for icelake entrypoint
source from
examples/read_iceberg_table.rs
- In
- a direction about iceberg
let table_uri = format!("{}/testdata/simple_table",env::current_dir())
- Output
let table = Table::open(table_uri.as_str()).await?;
get ArrowSchema from icelake::in_memory::Schema
- In
let schema = types::Schema {..}
- Output
let arrow_schema = ArrowSchema::try_from(schema).unwrap();
ABOUT parquet feature
- what you cloud use is
ParquetWriterBuilder
inner is
opendal::Writerit is also need witharrow_schema
let op = Operator::new(Memory::default())?.finish();
let w = op.writer("test").await?;
// ...
let mut pw = ParquetWriterBuilder::new(w, to_write.schema()).build()?;
// pw.write(&to_write).await?;
ParquetStreamBuilder
inner is
opendal::Reader
let op = Operator::new(Memory::default())?.finish();
let r = op.reader("test").await?;
let mut reader = ParquetStreamBuilder::new(r).build().await?;
let res = reader.next().await.unwrap()?;
Does icelake intend to support higher level Iceberg operations?
Such as:
- Read arrow
RecordBatchfrom parquet file after schema evolution. (Ignore deleted columns and append newly added columns) - Combine files of different types (data, position deletes, equality deletes).
Or icelake is just a base lib for Iceberg format, all the high-level operations should be implemented by the application?
Yes, we should cover those high level operations.
Heyo! I'm curious, what is the difference between this project and https://github.com/apache/iceberg-rust? I see it's being developed by the same team, but both are stated as "rust implementation of iceberg".
Heyo! I'm curious, what is the difference between this project and https://github.com/apache/iceberg-rust? I see it's being developed by the same team, but both are stated as "rust implementation of iceberg".
We began with icelake as a Rust implementation of Iceberg, but we later shifted our focus to direct contributions upstream. Now, icelake serves primarily as a staging area to test our concepts and ensure compatibility with existing applications. Ultimately, icelake will be integrated into iceberg-rust.
makes sense! i'm excited for iceberg-rust and hope to make some contributions.
Closing for https://github.com/icelake-io/icelake/issues/279