iceberg-rust feat: support create partition table for non REST catalog

fixes: #578

Aug 23 '24 15:08 FANNG1

Hi, @FANNG1 Thanks for your contribution. The reason why we use UnboundPartitionSpec rather PartitionSpec is to simplify the usage of this method. PartitionSpec is bound to a schema, and spec id, field ids are supposed to be meaningful. These ids are not supposed to be passed by user, but by the catalog implementation.

Aug 30 '24 02:08 liurenjie1024

Hi, @FANNG1 Thanks for your contribution. The reason why we use UnboundPartitionSpec rather PartitionSpec is to simplify the usage of this method. PartitionSpec is bound to a schema, and spec id, field ids are supposed to be meaningful. These ids are not supposed to be passed by user, but by the catalog implementation.

I agree that PartitionSpec is bound to a schema which is not user friendly, but when building UnboundPartitionSpec, we need source column id not name, it seems not friendly too? How about Using a separate TableRequestBuilder to build the TableCreation, in which PartitionSpec is generated by a more user friendly interfaces, WDYT?

pub struct TableCreation {
    /// The name of the table.
    pub name: String,
    /// The location of the table.
    #[builder(default, setter(strip_option))]
    pub location: Option<String>,
    /// The schema of the table.
    pub schema: Schema,
    /// The partition spec of the table, could be None.
    #[builder(default, setter(strip_option, into))]
    pub partition_spec: Option<PartitionSpec>,
    /// The sort order of the table.
    #[builder(default, setter(strip_option))]
    pub sort_order: Option<SortOrder>,
    /// The properties of the table.
    #[builder(default)]
    pub properties: HashMap<String, String>,
}

Aug 30 '24 10:08 FANNG1

The reason we use PartitionSpec was that some fields, such as spec id, partition field id should not be passed by user when creating a table. But I think you are right when a user wants to create partition spec of a table, it would be more nature to build against a schema. This maybe confusing to user because their spec id and partition field id will be lost, it seems that there is no type safe approach to ensure, and we need to add some doc to explain it.

Sep 02 '24 02:09 liurenjie1024

Thanks @FANNG1 for this pr. However I think there are some prepartion work before we can actually finished this pr. If we can narrow down the goal of this pr to change type for UnboundPartitonSpec to PartitionSpec only, we can merge this.

@liurenjie1024 , thanks for your review, let's keep consistent with what means change type for UnboundPartitonSpec to PartitionSpec only. The PR consistent serval parts:

Change UnboundPartitonSpec to PartitionSpec in TableCreation
Use PartitionSpec to build the table metadata
Integration tests about creating and showing partition table
Examples about create partition table and read data from table.

What you mean is just keeping the first part?

Sep 02 '24 10:09 FANNG1

Thanks @FANNG1 for this pr. However I think there are some prepartion work before we can actually finished this pr. If we can narrow down the goal of this pr to change type for UnboundPartitonSpec to PartitionSpec only, we can merge this.

@liurenjie1024 , thanks for your review, let's keep consistent with what means change type for UnboundPartitonSpec to PartitionSpec only. The PR consistent serval parts:

Change UnboundPartitonSpec to PartitionSpec in TableCreation

Use PartitionSpec to build the table metadata

Integration tests about creating and showing partition table

Examples about create partition table and read data from table.

What you mean is just keeping the first part?

Yes, I think keeping only part 1 is a good start. Add PartitionSpec is sth how similar to a transaction with two actions:

Add partition spec
Set default partition spec.

I think it's better to implement them in transaction api.

Sep 04 '24 09:09 liurenjie1024