nessie Content aware merge operations

Merge operations in Nessie "only" copy one or more commits from one reference onto another, since the common ancestor. Nessie itself does not interpret the meaning of the contents in the commits. While the Nessie merge operation is technically correct and works as designed, they prevent multiple, "nested" merge operations.

Example:

CREATE TABLE foo...;
-- User 1
CREATE BRANCH branch_a;
INSERT INTO foo ('abc');
-- User 2
CREATE BRANCH branch_b;
INSERT INTO foo ('def');
-- User 1
MERGE branch_a INTO main;
SELECT * FROM foo ... ; -- returns 'abc'
-- User 2
MERGE branch_b INTO main; -- CONFLICT

(Note: the above behavior is true for all Nessie versions)

I think, we have to have a "Nessie aware merge operation" in Iceberg itself, that properly

cherry-picks the Iceberg snapshots to be merged onto the target branch
updates the current schema in the target branch, even if the source reference does not update any data

From some investigation, Iceberg already contains the code for the building blocks:

SchemaUpdate.unionByNameWith() can merge two Schema objects
SnapshotManager.cherrypick(long snapshotId) to cherry-pick one snapshot
Schema.sameSchema() can compare two schema objects (semantically equivalent)

What's missing:

Functionality to cherry-pick snapshots from a different Iceberg Table
Functionality to perform all the schema-updates and cherry-picks in a single commit (like a single call to PendingUpdate.commit() leading to a single Nessie commit)

Unclear, whether we have to cherry-pick all Iceberg snapshots since the common ancestor or whether it's sufficient to just cherry-pick the most recent Iceberg snapshot (and the recent schema). Technically, the most recent Iceberg snapshot (and current schema) should be sufficient. But without the intermediate snapshots the change history provided by the Iceberg snapshots would be lost or become incomplete.

Not sure if SnapshotManager.cherrypick(long snapshotId) already tackles it: the "snapshot log" in TableMetadata must stay consistent.

I also think, that the functionality to do the above is not purely related to Nessie - it does not even have to touch Nessie code in Iceberg. It is strictly speaking "just" Iceberg functionality that produces a new TableMetadata, which then gets commited via the NessieTableOperations.

We can probably implement it as an Iceberg procedure next to CherrypickSnapshotProcedure for Spark 3.x

The same mechanism should also be done for Deltalake, but better as a separate issue / PR.

Oct 27 '21 09:10 snazy

Since we are talking about intelligent merge operations / content manipulation, would it make sense to support multiple parents in Nessie commits (like in git)?

With a content-aware merge, I guess the contents on the base branch may have non-trivial differences from both old base contents and the contents being merged. Therefore, it might be valuable to preserve the lineage of changes (unless the merge is fast-forward).

Oct 27 '21 23:10 dimas-b

Few observations -

The merge isn't handled correctly if any of the following operations happen on the target branch, after the fork ->a. update, b. delete, c. rewrites (compaction/sorting), d. partition spec changes. May be we should put guardrails to avoid these situations. IMO, (d) it should be possible to extend the logic to handle partition spec changes.
For transplant, the same conditions would apply even in the source branch.

Unclear, whether we have to cherry-pick all Iceberg snapshots since the common ancestor or whether it's sufficient to just cherry-pick the most recent Iceberg snapshot (and the recent schema). Technically, the most recent Iceberg snapshot (and current schema) should be sufficient. But without the intermediate snapshots the change history provided by the Iceberg snapshots would be lost or become incomplete.

SnapshotManager.cherrypick() handles this via delta added/deleted data files; but ignores existing data files assuming they're unchanged. For merge case, we'll need an aggregated view of added/deleted data files from all the snapshots from the point of fork. The two approaches are to cherry pick one by one and to aggregate and merge in a single snapshot. In DeltaLake too, each commit log file maintains only the delta.

Some thoughts in favour of cherry-picking each commit -

Within Nessie merge, we create a copy for each commit to the target branch. The content linked to the copied merge commit should ideally represent a merged snapshot at each hash.
Aggregation of added/deleted files across snapshots would require more memory as the deleted files have to be stored to match for interim cancellations.
Some level of history will be retained.
Aggregation has to be written separately for both, DeltaLake as well as Iceberg.

Cons of cherry-picking each commit -

Merge will become a multi-step operation e2e (starting from iceberg client).
It'll create more commits.

Oct 28 '21 06:10 harshm-dev

#6631 adds the Nessie side support for this

Apr 20 '23 10:04 snazy

nessie nessie copied to clipboard

Content aware merge operations

nessie
nessie copied to clipboard