Iceberg Catalog Migration Tool
A tool to migrate the owner of an Iceberg table from eg Hive to Nessie.
Also more generally migration strategy for using Nessie
Nothing to migrate yet, so removing the 1.0.0 milestone
Want to talk about it so adding back to milestone
We should also support vice versa, and across Nessie instances.
related to #126
The goal of this ticket (As per my assumption)
- Introduce a CLI tool (can stay in nessie-tools module) that can migrate (read as move, not copy) Iceberg tables in non-Nessie catalog to Nessie catalog. There are many catalogs that support iceberg tables(Hive, Hadoop, Glue, dynamodb, ECS, JDBC, REST, custom!). We should first focus on Hive, glue, hadoop (same order).
- CLI tool should take the inputs like, branch name to register (if not mentioned use default branch) and also the source catalog and nessie catalog configurations.
- CLI tool should establish a connection to source catalog and nessie catalog using the configurations and perform migrations (There can be thousands of table or even more). It is expected to be migrated in seconds to minutes based on the resource available.
- After migration, table should not be available in the source catalog and should be available from the Nessie catalog and queryable.
- Should not touch or move the table's metadata/data files that are stored in fileSystem.
Note: I didn't mention any query engine details anywhere, as same metastore (read as catalog) can be used between engines. So, Spark, Flink, Hive engine can populate same metastore using iceberg's hive catalog interface. similarly, Nessie can be running with any of the engines. So, migration is engine agnostic.
Please refer to PR #5297 (on Iceberg) for an API, helping to migrate Iceberg Tables from any Catalog (source) to any Catalog (Target).