insert+replace partitions incremental strategy
Summary
This PR implements insert+replace strategy discussed in #128 which does the following:
- Creates a new staging table with the same structure as the target table.
- Inserts data into the staging table.
- Replaces partitions in the target table from the staging table.
Advantages:
- Only the involved partitions are replaced: this is much cheaper than reinserting the full table which is implemented in other strategies.
- If an insertion fails, the target table is not affected.
Checklist
Delete items not relevant to your PR:
- [x] Unit and integration tests covering the common scenarios were added
- [x] A human-readable description of the changes was provided to include in CHANGELOG
- [x] For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials
At the time of publishing the PR is in WIP (work in progress) state since I need an advice from the community. Thus, neither documentation nor tests are updated.
Hi @simpl1g and @genzgd, I've wrapped up this PR for insert+replace strategy implementation: added some fancy documentation in the README and threw in integration tests (taking a page from lw-deletes). Motivation for the feature is given in the PR's description.
Some discussions above are unresolved, waiting on your pearls of wisdom there :)
Please take a peek at the PR when you get a chance, and hit me up with your thoughts. Hoping we can merge the feature soon and let the good times roll with the new strategy! :)
Hi @simpl1g and @genzgd! This is a kind reminder about my PR which is ready for your review. We have successfully battle-tested it internally ✅
We install my version from GitHub currently. It would be nice if you can approve it and release to PyPI. If any adjustments are required, please let me know! 🙏
@bryzgaloff I apologize that we haven't yet had the resources to fully review this PR. As you may have noticed we've been focused on bug fixes and compatibility with the new dbt releases. Please know that we very much appreciate the contribution (especially with test cases and real world usage) and your work is next on the roadmap as we get time.
If you have a chance to resolve the conflicts over the next few weeks that would be appreciated and make the review just a bit easier.
@bryzgaloff Thanks again for your contribution! I would like to review your PR and merge it within the next few days. Can you sync your fork with the main repo? We upgraded dbt-core to 1.8.0.
After syncing I'll review this one right away.
Hi @BentsiLeviav thank you (and the other reviewers, of course!) for your participation. I am not actively using the plugin right now, but I may get back to handling your review feedback late next week. If the conflicts are not too critical, I might be able to resolve them quickly.
Hi @BentsiLeviav I have rebased my contribution onto the latest main of this repository. Also, I have renamed the strategy to insert_replace. Please have a look! 👋
Does this repo has automated tests? I see there are workflows to be approved.
I do not have a quick infra to retest the contribution after the rebase, but if there is not testing workflow, I will perform the manual testing.
Hi @BentsiLeviav and @genzgd — will you have a chance review and merge the PR soon? I have updated everything according to the review feedback and all the checks have passed. I would like to avoid the need to rebase it again 😅
Hi @bryzgaloff Huge thanks for this. I will review and update you within the next few days.
Thank you @BentsiLeviav for the approval! What are the next steps for the PR to be merged?
Hi @bryzgaloff Sure, thank you for your contribution!
Before merging this, could you please add to the doc that this feature is experimental, and wasn't tested with cluster setup? It is crucial to highlight these 2 points.
Once we are done with that, I'll merge your PR.
Thanks again for your work!
Never mind, I'll take care of it :)
Thank you for all you help and merge, @BentsiLeviav! 🤝