iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

Make `location` in `TableCreation` optional

Open Xuanwo opened this issue 2 years ago • 9 comments

I'm not sure, but wouldn't it be more natural for Catalog to manage the location of the table and the table metadata itself? If so, the location field of the TableCreation struct would better be deleted, or at least made Optional. (For now, I'm thinking about ignoring that field when test-implementing the function.)

Thanks.

Originally posted by @zeodtr in https://github.com/apache/iceberg-rust/issues/54#issuecomment-1734895392

Xuanwo avatar Sep 26 '23 07:09 Xuanwo

What do you think? cc @Fokko @liurenjie1024 @JanKaul

Xuanwo avatar Sep 26 '23 07:09 Xuanwo

Did I understand correctly that the location field in the TableCreation struct relates to the location field in the table-metadata?

If so, I would manage the location as part of the table metadata. If we aim to create v2 iceberg tables by default, I think the field should not be optional. It we want to enable the creation of v1 tables we could make it optional.

JanKaul avatar Sep 26 '23 08:09 JanKaul

@zeodtr What kind of catalog are you trying to implement?

JanKaul avatar Sep 26 '23 08:09 JanKaul

Did I understand correctly that the location field in the TableCreation struct relates to the location field in the table-metadata?

Do I understand correctly that @zeodtr's idea is to have the location field in the table-metadata managed by the catalog, making it optional for the catalog to determine the actual location?

Xuanwo avatar Sep 26 '23 08:09 Xuanwo

@zeodtr What kind of catalog are you trying to implement?

What I'm thinking is as explained by @Xuanwo:

Do I understand correctly that @zeodtr's idea is to have the location field in the table-metadata managed by the catalog, making it optional for the catalog to determine the actual location?

I'm trying to implement a Catalog that manages the table and table metadata by itself, including where to store/retrieve them. So, the Catalog wants to determine the location by itself, rather than take it as an argument. (Edited: 'itself' to 'by itself')

zeodtr avatar Sep 26 '23 09:09 zeodtr

Ah okay, now I get it. Thanks for the explanation.

I'm not really sure what the implications of this are. I think the iceberg crate still has to assume that the location is not managed by the catalog in general.

JanKaul avatar Sep 26 '23 11:09 JanKaul

I think by the openapi spec also says location field is optional: https://github.com/apache/iceberg/blob/621d301b17417af4b71a0e8c93db9db6f8266839/open-api/rest-catalog-open-api.yaml#L1948

And it's reasonable to make it optional since a catalog may decide the directory layout of all tables by itself.

liurenjie1024 avatar Sep 27 '23 01:09 liurenjie1024

I'm not really sure what the implications of this are. I think the iceberg crate still has to assume that the location is not managed by the catalog in general.

I think this comment by @JanKaul summarizes the issue. The answer to the question is: it depends.

For the REST catalog, the metadata is managed by the catalog. For the rest (Hive, Glue, Dynamo, SQL etc) the metadata is managed by the crate. I think it is reasonable to make it optional.

In PyIceberg it is still required, but that's mostly since we haven't seen any situation where it was missing.

Fokko avatar Sep 27 '23 17:09 Fokko

I'm going to make this change.

Xuanwo avatar Oct 13 '23 12:10 Xuanwo