nessie icon indicating copy to clipboard operation
nessie copied to clipboard

Iceberg Catalog Migration Tool

Open rymurr opened this issue 4 years ago • 6 comments

A tool to migrate the owner of an Iceberg table from eg Hive to Nessie.

Also more generally migration strategy for using Nessie

rymurr avatar Apr 23 '21 14:04 rymurr

Nothing to migrate yet, so removing the 1.0.0 milestone

snazy avatar Sep 15 '21 10:09 snazy

Want to talk about it so adding back to milestone

rymurr avatar Sep 15 '21 10:09 rymurr

We should also support vice versa, and across Nessie instances.

harshm-dev avatar Oct 28 '21 09:10 harshm-dev

related to #126

harshm-dev avatar Oct 28 '21 11:10 harshm-dev

The goal of this ticket (As per my assumption)

  • Introduce a CLI tool (can stay in nessie-tools module) that can migrate (read as move, not copy) Iceberg tables in non-Nessie catalog to Nessie catalog. There are many catalogs that support iceberg tables(Hive, Hadoop, Glue, dynamodb, ECS, JDBC, REST, custom!). We should first focus on Hive, glue, hadoop (same order).
  • CLI tool should take the inputs like, branch name to register (if not mentioned use default branch) and also the source catalog and nessie catalog configurations.
  • CLI tool should establish a connection to source catalog and nessie catalog using the configurations and perform migrations (There can be thousands of table or even more). It is expected to be migrated in seconds to minutes based on the resource available.
  • After migration, table should not be available in the source catalog and should be available from the Nessie catalog and queryable.
  • Should not touch or move the table's metadata/data files that are stored in fileSystem.

Note: I didn't mention any query engine details anywhere, as same metastore (read as catalog) can be used between engines. So, Spark, Flink, Hive engine can populate same metastore using iceberg's hive catalog interface. similarly, Nessie can be running with any of the engines. So, migration is engine agnostic.

ajantha-bhat avatar Jun 01 '22 09:06 ajantha-bhat

Please refer to PR #5297 (on Iceberg) for an API, helping to migrate Iceberg Tables from any Catalog (source) to any Catalog (Target).

Mehul2500 avatar Jul 21 '22 11:07 Mehul2500