Support for CollapsingMergeTree in Clickhouse to avoid 'order by' limitation
Problem
Jitsu supports storing all events from anonymous users and updates them in DWH with user id after user identification. When identification_nodes is received the events are replayed and in case of Clickhouse, ReplicatedMergeTree should take care of de-duplication. For this to work, order by "column" should match with the new event which means it will work as long we only have e.g. eventn_ctx_event_id in order by. If we include identification_nodes e.g. "user_id" in order by, it will not de-duplicate, as user_id was null in the first event. This has performance penalty when running queries.
Solution
Can we consider using https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/collapsingmergetree/ instead?
This requires Jitsu to add SIGN column when inserting events. On identification, replay events with sign -1 and insert new events with identification_node.
By doing this, we can add identification_nodes column/s as sorting key