api: Metastore Catalog API design
Generally Iceberg catalogs function in two different ways. The rest catalog handles the updates to the table metadata internally. All other catalogs store the metadata in a metadata.json file and store the pointer to the current metadata file.
I would like to define a trait MetastoreCatalog(name is not important) that works only with the location of the metadata.json file. Additionally I would add an implementation for Catalog as follows:
trait MetastoreCatalog {
...
}
impl<T> Catalog for T
where
T: MetastoreCatalog
{
...
}
This way the non-rest catalogs only have to implement the metadata file logic through the MetastoreCatalog trait and will automatically implement Catalog. The goal is to reduce the redundancy between the catalog implementations.
The trait is going to look similar to the Catalog trait, the biggest difference will be in the update_table method which will look similar to this:
async fn update_table(&self, table: &TableIdent, metadata_location: &str, previous_metadata_location: &str) -> Result<String>;
If you agree with this approach I will work on a PR to add the trait.
Hi, thanks for the suggestion.
Personally, I think we should prioritize adding some implementations first and then figure out how to merge duplicated code. For example, we can add storage catalog and hive catalog first to verify our design.
What are your thoughts?
I didn't implement other catalogs before, so it's hard for me to predict if this abstraction is correct. Since this is just to avoid code duplication, I also agree with @Xuanwo that we should do concrete implementation first, and do the refactoring later when necessary without risking any breaking changes.
Catalog API has been added.