trino
trino copied to clipboard
Fix merge api #15585
Description
The new MERGE API introduced in https://github.com/trinodb/trino/pull/13926 Fails to pass the relevant updated columns for the connector which eliminates the ability to perform in-place updates on specific columns, which leads to full scan read and write. This cause a severe performance degradations.
Additional context and related issues
This fix introduces a new RowChangeParadigm
- UPDATE_PARTIAL_COLUMNS
, which deals with this issue.
And also an API change to support passing the updated columns.
Release notes
( ) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. ( ) Release notes are required, with the following suggested text:
# Section
* Fix some things. ({issue}`issuenumber`)
@rz-vastdata please review
@djsstarburst @kokosing please review, notes:
- Backward compatibility: I changed the number of parameters in the MERGE API to accommodate in place updates.
- Currently, Cassandra connector is failing some test on different error message than expected, Cassandra expects error of unsupported UPDATE/ DELETE operations, and failing for unsupported mergeParadigm query (which is necessary for our fix), I guess that there are a few ways to fix it, what do you recommend?
@djsstarburst Thanks for the review, please have another look, I modified the Phoenix connector to work with the new paradigm, there is one open issue yet I was hoping to get some assistance from Phoenix maintainers. Thanks 🙏
@djsstarburst ping :)
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua
👋 @gilco-vastdata and @djsstarburst .. I assume you are still working on this and will continue the discussion and update code as needed.
Hello @electrum , I hope you're well. Could you kindly take a moment to review this pull request? @djsstarburst has already provided his feedback and specifically asked for your input. Please consider his comments during your review. Thank you!
I don't think we need a new row change paradigm here. They both seem to do the same thing. Instead, CHANGE_ONLY_UPDATED_COLUMNS
should support modifying only the affected columns. This is how it should have worked in the beginning, but wasn't done given the huge scope of implementing MERGE.
I don't think we need a new row change paradigm here. They both seem to do the same thing.
IIUC, current row change paradigm doesn't support updating a partial list of column - since the actual column list (to be updated) is not passed to ConnectorMergeSink
:
https://trinodb.slack.com/archives/CP1MUNEUX/p1692011678685409
In order to support the existing connectors' behavior, we have introduced a new paradigm - but if it's OK by you, we can reuse the existing one (i.e. CHANGE_ONLY_UPDATED_COLUMNS
).
WDYT?
We should update the existing one to support updating a partial column list. This is a minor API difference and not a fundamentally different paradigm / model.
Thanks @electrum! We will update this pull request :)
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.
Reopened and applied stale-ignore label cc @gilco-vastdata and @electrum
@rz-vastdata , can you please follow up with what needs to be change, so we can merge this PR?
We should update the existing one to support updating a partial column list. This is a minor API difference and not a fundamentally different paradigm / model.
There is still work needed to be done - it will probably take some time.