iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Support writing to a branch

Open Fokko opened this issue 1 year ago • 15 comments

Feature Request / Improvement

Right now we hardcoded that we write to the main branch all the time, would be great to make this configurable.

Fokko avatar Jan 26 '24 19:01 Fokko

The API to write to a branch should look something like

def append(self, df: pa.Table, branch: str = MAIN_BRANCH)
...
def overwrite(self, df: pa.Table, overwrite_filter: BooleanExpression = ALWAYS_TRUE, branch: str = MAIN_BRANCH)
...

But in order to write to a branch, the branch needs to be created first.

From https://iceberg.apache.org/docs/latest/spark-writes/#writing-to-branches:

the branch must exist before performing the write. The operation does not create the branch if it does not exist.

kevinjqliu avatar Jan 27 '24 19:01 kevinjqliu

First pass, just refactoring #312

kevinjqliu avatar Jan 27 '24 19:01 kevinjqliu

We first need a create branch API.

Then update places currently gated by MAIN_BRANCH.

  1. https://github.com/apache/iceberg-python/blob/9e0394939bab8d6b26cdde6f71173bdd42b55b2e/pyiceberg/table/metadata.py#L124
  2. https://github.com/apache/iceberg-python/blob/9e0394939bab8d6b26cdde6f71173bdd42b55b2e/pyiceberg/table/init.py#L592

And finally add a test for writing to a branch

kevinjqliu avatar Jan 27 '24 19:01 kevinjqliu

also note dev/provision.py which is used for integration tests already have statements to create tags and branchs

https://github.com/apache/iceberg-python/blob/9e0394939bab8d6b26cdde6f71173bdd42b55b2e/dev/provision.py#L146-L148

kevinjqliu avatar Jan 27 '24 19:01 kevinjqliu

I had an offline chat with @kevinjqliu , I shall work on this to build off of the PR created by kevin.

Gowthami03B avatar Feb 15 '24 22:02 Gowthami03B

Hello, can I hop back on this train if no one else is actively working on this (again building off of kevin's work)? @kevinjqliu @Fokko

Gowthami03B avatar May 15 '24 18:05 Gowthami03B

@Gowthami03B definitely, I've assigned the issue to you!

kevinjqliu avatar May 15 '24 19:05 kevinjqliu

@Gowthami03B definitely, I've assigned the issue to you!

Opening up this to the community as I am gonna be out for the next month! @kevinjqliu

Gowthami03B avatar Jun 27 '24 04:06 Gowthami03B

@kevinjqliu @Fokko I can take this forward if no one is actively working on this.

vinjai avatar Jul 06 '24 08:07 vinjai

@vinjai yes! please go ahead.

kevinjqliu avatar Jul 11 '24 22:07 kevinjqliu

Hi @vinjai thank you very much for working on this issue. I'm just working through the list of open items to check if they are still actively being worked on. Are you still interested in contributing this feature to PyIceberg? 🧊

sungwy avatar Sep 24 '24 15:09 sungwy

Hey @sungwy I am working on this at the moment Will open the PR for review by next week

vinjai avatar Sep 29 '24 22:09 vinjai

PR is ready for review

vinjai avatar Oct 17 '24 22:10 vinjai