trino icon indicating copy to clipboard operation
trino copied to clipboard

Fix merge api #15585

Open gilco-vastdata opened this issue 1 year ago • 14 comments

Description

The new MERGE API introduced in https://github.com/trinodb/trino/pull/13926 Fails to pass the relevant updated columns for the connector which eliminates the ability to perform in-place updates on specific columns, which leads to full scan read and write. This cause a severe performance degradations.

Additional context and related issues

This fix introduces a new RowChangeParadigm - UPDATE_PARTIAL_COLUMNS, which deals with this issue. And also an API change to support passing the updated columns.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. ( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

gilco-vastdata avatar Nov 19 '23 11:11 gilco-vastdata

@rz-vastdata please review

gilco-vastdata avatar Nov 19 '23 12:11 gilco-vastdata

@djsstarburst @kokosing please review, notes:

  1. Backward compatibility: I changed the number of parameters in the MERGE API to accommodate in place updates.
  2. Currently, Cassandra connector is failing some test on different error message than expected, Cassandra expects error of unsupported UPDATE/ DELETE operations, and failing for unsupported mergeParadigm query (which is necessary for our fix), I guess that there are a few ways to fix it, what do you recommend?

gilco-vastdata avatar Nov 19 '23 14:11 gilco-vastdata

@djsstarburst Thanks for the review, please have another look, I modified the Phoenix connector to work with the new paradigm, there is one open issue yet I was hoping to get some assistance from Phoenix maintainers. Thanks 🙏

gilco-vastdata avatar Dec 06 '23 18:12 gilco-vastdata

@djsstarburst ping :)

gilco-vastdata avatar Dec 17 '23 08:12 gilco-vastdata

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Jan 29 '24 17:01 github-actions[bot]

👋 @gilco-vastdata and @djsstarburst .. I assume you are still working on this and will continue the discussion and update code as needed.

mosabua avatar Jan 29 '24 17:01 mosabua

Hello @electrum , I hope you're well. Could you kindly take a moment to review this pull request? @djsstarburst has already provided his feedback and specifically asked for your input. Please consider his comments during your review. Thank you!

gilco-vastdata avatar Feb 04 '24 08:02 gilco-vastdata

I don't think we need a new row change paradigm here. They both seem to do the same thing. Instead, CHANGE_ONLY_UPDATED_COLUMNS should support modifying only the affected columns. This is how it should have worked in the beginning, but wasn't done given the huge scope of implementing MERGE.

electrum avatar Feb 20 '24 19:02 electrum

I don't think we need a new row change paradigm here. They both seem to do the same thing.

IIUC, current row change paradigm doesn't support updating a partial list of column - since the actual column list (to be updated) is not passed to ConnectorMergeSink: https://trinodb.slack.com/archives/CP1MUNEUX/p1692011678685409

In order to support the existing connectors' behavior, we have introduced a new paradigm - but if it's OK by you, we can reuse the existing one (i.e. CHANGE_ONLY_UPDATED_COLUMNS).

WDYT?

rz-vastdata avatar Feb 25 '24 12:02 rz-vastdata

We should update the existing one to support updating a partial column list. This is a minor API difference and not a fundamentally different paradigm / model.

electrum avatar Feb 28 '24 19:02 electrum

Thanks @electrum! We will update this pull request :)

rz-vastdata avatar Mar 02 '24 17:03 rz-vastdata

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Mar 25 '24 17:03 github-actions[bot]

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

github-actions[bot] avatar Apr 15 '24 17:04 github-actions[bot]

Reopened and applied stale-ignore label cc @gilco-vastdata and @electrum

mosabua avatar Apr 30 '24 15:04 mosabua

@rz-vastdata , can you please follow up with what needs to be change, so we can merge this PR?

fgelcer avatar Jul 03 '24 10:07 fgelcer

We should update the existing one to support updating a partial column list. This is a minor API difference and not a fundamentally different paradigm / model.

There is still work needed to be done - it will probably take some time.

rz-vastdata avatar Jul 04 '24 07:07 rz-vastdata