iceberg
iceberg copied to clipboard
Add StartTransaction API to REST multi-table transaction support
Proposed Change
We have identified an issue with the current commit-only multi-table transaction support. The proposal provides an analysis of the isolation guarantee it could potentially break, and offers a solution of introducing StartTransaction to solve this problem.
Proposal document
https://docs.google.com/document/d/10tfqETygf2BLA34CoZLxK3v5xk1BWUNKFA9WE8X_w-U/edit#heading=h.kbf1q7197nxq
Specifications
- [X] Table
- [ ] View
- [X] REST
- [ ] Puffin
- [ ] Encryption
- [ ] Other
Just adding the original doc for reference: https://docs.google.com/document/d/1UxXifU8iqP_byaW4E2RuKZx1nobxmAvc5urVcWas1B8/edit#heading=h.6sa1rpsxiuke
I just wanted to clarify that what's currently in REST is not multi-table transaction support. It's a pure endpoint that allows a multi-table commit without actually providing any API semantics around transaction Isolation. Adding actual multi-table transaction support is what's being described in https://docs.google.com/document/d/1UxXifU8iqP_byaW4E2RuKZx1nobxmAvc5urVcWas1B8/edit#heading=h.6sa1rpsxiuke and there's a prototype available in https://github.com/apache/iceberg/pull/6948.
Given that #6948 isn't done yet it seems too early to talk about REST-related changes for multi-table transaction support - unless you had something else in mind here @jackye1995?
I see, thanks for the context, I remember this PR, I thought the conclusion was to just do multi-table commit. What about we just use this proposal to track the full "multi-table transaction" support? Because I think the full support entails the concept of starting a transaction, or createTransaction in your API that needs to be server-aware. We can discuss these 2 proposals together. What do you think?
We can definitely rename this proposal to track the Catalog Transaction API support aka multi-table transactions but I don't recall that we have concluded on just doing a multi-table commit. I'll rename this proposal to reflect the work mentioned in the doc and we can add anything else that needs to be discussed on top of that.
I don't recall that we have concluded on just doing a multi-table commit.
yeah that's probably just my misunderstanding, since multi-table commit was what was eventually added.
So just to be clear, this will be only for the REST catalog right? Do we consider this feature also for other catalogs? Because I see you write that in the Google doc "Implementing multi-table TX support for other catalogs" is a non-goal, but I did not see any OpenAPI specification description in the doc.
So just to be clear, this will be only for the REST catalog right? Do we consider this feature also for other catalogs? Because I see you write that in the Google doc "Implementing multi-table TX support for other catalogs" is a non-goal, but I did not see any OpenAPI specification description in the doc.
The scope of the design doc / impl is to add all of the required core APIs in order to support multi-table transactions in the first place. Adding support for REST would be the next logical step in showing that multi-table transactions actually work. The APIs need to be designed in a way that other catalogs would theoretically be able to support multi-table transactions but in practice only REST / Nessie might be able to support it.
The reason I haven't done any REST spec work yet is because the core APIs and the impl hasn't been solidified yet and my focus back then shifted to adding view support.
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
Guys Any update on Multi-table Transactions progress with Nessie or Rest Catalog
@heman026 I didn't have resources to get back to this proposal but there's another proposal being discussed here: https://lists.apache.org/thread/r5otylsrm4txd4oxyv7c6scdwrbolck9