[Feature] Support Chain Table
Search before asking
- [x] I searched in the issues and found nothing similar.
Motivation
Chain table is a new table type designed for data warehouse where full data are stored periodically, but the incremental change between periods is small. The primary benefits of using chain table include incremental computation and storage、significant storage savings、accelerated data pipelines etc
Solution
On the basis of the warehouse, two branches, delta and snapshot, are newly added to describe the newly changed data and the full data generated by chain compact respectively. During writing, data is written to the corresponding branch according to conf about branch, and during reading, the corresponding reading strategy is adopted based on whether full data exists in the current partition; Let's take the daily ODS Binlog Dump scene & weekly compact cycle as an example, and uses t$delta and t$snapshot to represent the delta and snapshot branches respectively, to introduce the layout, compact, writing and reading operation about the chain table.
Anything else?
https://cwiki.apache.org/confluence/display/PAIMON/PIP-37%3A+Introduce+Chain+Table
Are you willing to submit a PR?
- [x] I'm willing to submit a PR!
The url link is not correct @Stefanietry
@Stefanietry Hi, I apologize for not communicating with you in advance. I've written an implementation based on your proposal. I'm not sure if it meets your needs. If you'd like to do it yourself, I can also abandon this PR.
https://github.com/apache/paimon/pull/6380
@kaori-seasons Hi, we have submitted a pr. You can check if it meets your needs. . For details, please refer to https://github.com/apache/paimon/pull/6394