iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

feat: support create partition table for non REST catalog

Open FANNG1 opened this issue 1 year ago • 5 comments

fixes: #578

FANNG1 avatar Aug 23 '24 15:08 FANNG1

Hi, @FANNG1 Thanks for your contribution. The reason why we use UnboundPartitionSpec rather PartitionSpec is to simplify the usage of this method. PartitionSpec is bound to a schema, and spec id, field ids are supposed to be meaningful. These ids are not supposed to be passed by user, but by the catalog implementation.

liurenjie1024 avatar Aug 30 '24 02:08 liurenjie1024

Hi, @FANNG1 Thanks for your contribution. The reason why we use UnboundPartitionSpec rather PartitionSpec is to simplify the usage of this method. PartitionSpec is bound to a schema, and spec id, field ids are supposed to be meaningful. These ids are not supposed to be passed by user, but by the catalog implementation.

I agree that PartitionSpec is bound to a schema which is not user friendly, but when building UnboundPartitionSpec, we need source column id not name, it seems not friendly too? How about Using a separate TableRequestBuilder to build the TableCreation, in which PartitionSpec is generated by a more user friendly interfaces, WDYT?

pub struct TableCreation {
    /// The name of the table.
    pub name: String,
    /// The location of the table.
    #[builder(default, setter(strip_option))]
    pub location: Option<String>,
    /// The schema of the table.
    pub schema: Schema,
    /// The partition spec of the table, could be None.
    #[builder(default, setter(strip_option, into))]
    pub partition_spec: Option<PartitionSpec>,
    /// The sort order of the table.
    #[builder(default, setter(strip_option))]
    pub sort_order: Option<SortOrder>,
    /// The properties of the table.
    #[builder(default)]
    pub properties: HashMap<String, String>,
}

FANNG1 avatar Aug 30 '24 10:08 FANNG1

The reason we use PartitionSpec was that some fields, such as spec id, partition field id should not be passed by user when creating a table. But I think you are right when a user wants to create partition spec of a table, it would be more nature to build against a schema. This maybe confusing to user because their spec id and partition field id will be lost, it seems that there is no type safe approach to ensure, and we need to add some doc to explain it.

liurenjie1024 avatar Sep 02 '24 02:09 liurenjie1024

Thanks @FANNG1 for this pr. However I think there are some prepartion work before we can actually finished this pr. If we can narrow down the goal of this pr to change type for UnboundPartitonSpec to PartitionSpec only, we can merge this.

@liurenjie1024 , thanks for your review, let's keep consistent with what means change type for UnboundPartitonSpec to PartitionSpec only. The PR consistent serval parts:

  1. Change UnboundPartitonSpec to PartitionSpec in TableCreation
  2. Use PartitionSpec to build the table metadata
  3. Integration tests about creating and showing partition table
  4. Examples about create partition table and read data from table.

What you mean is just keeping the first part?

FANNG1 avatar Sep 02 '24 10:09 FANNG1

Thanks @FANNG1 for this pr. However I think there are some prepartion work before we can actually finished this pr. If we can narrow down the goal of this pr to change type for UnboundPartitonSpec to PartitionSpec only, we can merge this.

@liurenjie1024 , thanks for your review, let's keep consistent with what means change type for UnboundPartitonSpec to PartitionSpec only. The PR consistent serval parts:

  1. Change UnboundPartitonSpec to PartitionSpec in TableCreation
  2. Use PartitionSpec to build the table metadata
  3. Integration tests about creating and showing partition table
  4. Examples about create partition table and read data from table.

What you mean is just keeping the first part?

Yes, I think keeping only part 1 is a good start. Add PartitionSpec is sth how similar to a transaction with two actions:

  1. Add partition spec
  2. Set default partition spec.

I think it's better to implement them in transaction api.

liurenjie1024 avatar Sep 04 '24 09:09 liurenjie1024