clickhouse-sink-connector icon indicating copy to clipboard operation
clickhouse-sink-connector copied to clipboard

Do not parse Engine from Clickhouse table definition

Open BorisTyshkevich opened this issue 2 years ago • 2 comments

Sync Connector tries to parse the Clickhouse table engine definition to get the names of version and deleted columns for ReplacingMergeTree.

That creates a problem with creating tables with custom structures with different engines such as ReplicatedReplacingMergeTree, CollapsingMergeTree, ReplicatedVersionedCollapsingMergeTree, EmbeddedRocksDB, Null, etc. It would be difficult to make robust code and test it for such a wide variety of Engines (new Engines also could be added to Clickhouse in the future). Better not to parse it at all.

Sync Connector only needs column list and column types for processing. So it could parse only them. _version and _deleted columns could have fixed names or their names could be defined in the config file:

clickhouse.table.version: "_version"
clickhouse.table.deleted: "_deleted"

BorisTyshkevich avatar Nov 07 '23 14:11 BorisTyshkevich

Currently only the RMT is the only supported engine. We may support MergeTree for history tables. But that's an enhancement.

aadant avatar Nov 09 '23 04:11 aadant

It's not a small enhancement, but the feature that extends the overall functionality of the Connector to the very high level by MVs in Kafka Engine stile with any possible Clickhouse SQL functionality.
The Null Engine is needed in the first place. Not MergeTree.

For example, it could be used to make a workaround for aggregation tables until exactly once delivery is implemented. See discussion here - https://github.com/Altinity/clickhouse-sink-connector/issues/364#issuecomment-1803237794

Many complicated DWH transformations could be created with Null Engine and MVs.

BorisTyshkevich avatar Nov 09 '23 06:11 BorisTyshkevich