Make `location` in `TableCreation` optional
I'm not sure, but wouldn't it be more natural for
Catalogto manage the location of the table and the table metadata itself? If so, thelocationfield of theTableCreationstruct would better be deleted, or at least madeOptional. (For now, I'm thinking about ignoring that field when test-implementing the function.)Thanks.
Originally posted by @zeodtr in https://github.com/apache/iceberg-rust/issues/54#issuecomment-1734895392
What do you think? cc @Fokko @liurenjie1024 @JanKaul
Did I understand correctly that the location field in the TableCreation struct relates to the location field in the table-metadata?
If so, I would manage the location as part of the table metadata. If we aim to create v2 iceberg tables by default, I think the field should not be optional. It we want to enable the creation of v1 tables we could make it optional.
@zeodtr What kind of catalog are you trying to implement?
Did I understand correctly that the
locationfield in theTableCreationstruct relates to thelocationfield in the table-metadata?
Do I understand correctly that @zeodtr's idea is to have the location field in the table-metadata managed by the catalog, making it optional for the catalog to determine the actual location?
@zeodtr What kind of catalog are you trying to implement?
What I'm thinking is as explained by @Xuanwo:
Do I understand correctly that @zeodtr's idea is to have the location field in the table-metadata managed by the catalog, making it optional for the catalog to determine the actual location?
I'm trying to implement a Catalog that manages the table and table metadata by itself, including where to store/retrieve them.
So, the Catalog wants to determine the location by itself, rather than take it as an argument.
(Edited: 'itself' to 'by itself')
Ah okay, now I get it. Thanks for the explanation.
I'm not really sure what the implications of this are. I think the iceberg crate still has to assume that the location is not managed by the catalog in general.
I think by the openapi spec also says location field is optional: https://github.com/apache/iceberg/blob/621d301b17417af4b71a0e8c93db9db6f8266839/open-api/rest-catalog-open-api.yaml#L1948
And it's reasonable to make it optional since a catalog may decide the directory layout of all tables by itself.
I'm not really sure what the implications of this are. I think the iceberg crate still has to assume that the location is not managed by the catalog in general.
I think this comment by @JanKaul summarizes the issue. The answer to the question is: it depends.
For the REST catalog, the metadata is managed by the catalog. For the rest (Hive, Glue, Dynamo, SQL etc) the metadata is managed by the crate. I think it is reasonable to make it optional.
In PyIceberg it is still required, but that's mostly since we haven't seen any situation where it was missing.
I'm going to make this change.