Fix premature TTL column removal causing merge failures and wrong defaults
Changelog category (leave one):
- Bug Fix (user-visible misbehavior in an official stable release)
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixed several issues caused by premature column removal in TTL. Resolves #88002
Documentation entry for user-facing changes
- [ ] Documentation is written (mandatory for new features)
Details
- Fix merge failures in GraphiteMergeTree, VersionedCollapsingMergeTree, ReplacingMergeTree, and AggregatingMergeTree when TTL-removed columns were still required by merge logic
- Fix incorrect skip index or data values caused by filling missing columns with type defaults instead of column DEFAULT expressions
- Fix missing OPTIMIZE ... DEDUPLICATE BY behavior when deduplication columns were removed by TTL
Workflow [PR], commit [3deb13e1]
Summary: ❌
| job_name | test_name | status | info | comment |
|---|---|---|---|---|
| Stateless tests (arm_asan, targeted) | failure | |||
| 01459_manual_write_to_replicas_quorum_detach_attach | FAIL | cidb, issue | ||
| BuzzHouse (amd_debug) | failure | |||
| Logical error: 'Inconsistent AST formatting: the query: | FAIL | cidb | ||
| BuzzHouse (amd_msan) | failure | |||
| Unknown error | FAIL | cidb |
Looks good. Could you please add a test for one of the engines mentioned in the changelog?
Sure. I've added more tests and fixed more issues.
Thanks for the PR!
Looks like the tests has failed, unfortunately =\
Looks like the tests has failed, unfortunately =\
It should be fine now.
I wrote a test, unfortunately, it fails as following:
=================================== FAILURES ===================================
_______________________________ test_ttl_version _______________________________
[gw0] linux -- Python 3.10.12 /usr/bin/python3
test_graphite_merge_tree/test.py:534: in test_ttl_version
q("OPTIMIZE TABLE test.graphite PARTITION 200109 FINAL")
helpers/cluster.py:4255: in query
return self.client.query(
helpers/client.py:39: in wrap
return func(self, *args, **kwargs)
helpers/client.py:79: in query
).get_answer()
helpers/client.py:247: in get_answer
raise QueryRuntimeException(
E helpers.client.QueryRuntimeException: Client failed! Return code: 10, stderr: Received exception from server (version 25.11.1):
E Code: 10. DB::Exception: Received from 172.16.1.2:9000. DB::Exception: Not found column updated in block. There are only columns: metric, timestamp, value, date. Stack trace:
E
E 0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000013eab75f
E 1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000cb63fce
E 2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000cb63a80
E 3. DB::Exception::Exception<String const&, String>(int, FormatStringHelperImpl<std::type_identity<String const&>::type, std::type_identity<String>::type>, String const&, String&&) @ 0x000000000dc115ab
E 4. DB::Block::getPositionByName(String const&, bool) const @ 0x0000000017681de4
E 5. DB::GraphiteRollupSortedAlgorithm::GraphiteRollupSortedAlgorithm(std::shared_ptr<DB::Block const>, unsigned long, DB::SortDescription, unsigned long, unsigned long, std::optional<unsigned long>, DB::Graphite::Params, long) @ 0x000000001ac60ef6
E 6. DB::IMergingTransform<DB::GraphiteRollupSortedAlgorithm>::IMergingTransform<std::shared_ptr<DB::Block const>&, unsigned long&, DB::SortDescription, unsigned long&, unsigned long&, std::optional<unsigned long>&, DB::Graphite::Params, long&>(unsigned long, std::shared_ptr<DB::Block const>, std::shared_ptr<DB::Block const>, bool, unsigned long, bool, std::shared_ptr<DB::Block const>&, unsigned long&, DB::SortDescription&&, unsigned long&, unsigned long&, std::optional<unsigned long>&, DB::Graphite::Params&&, long&) @ 0x0000000019e82d83
E 7. DB::MergePartsStep::transformPipeline(DB::QueryPipelineBuilder&, DB::BuildQueryPipelineSettings const&) @ 0x0000000019e762c6
E 8. DB::ITransformingStep::updatePipeline(std::vector<std::unique_ptr<DB::QueryPipelineBuilder, std::default_delete<DB::QueryPipelineBuilder>>, std::allocator<std::unique_ptr<DB::QueryPipelineBuilder, std::default_delete<DB::QueryPipelineBuilder>>>>, DB::BuildQueryPipelineSettings const&) @ 0x000000001ad51f57
E 9. DB::QueryPlan::buildQueryPipeline(DB::QueryPlanOptimizationSettings const&, DB::BuildQueryPipelineSettings const&, bool) @ 0x000000001ad96d80
E 10. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::createMergedStream() const @ 0x0000000019e6020d
E 11. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::prepare() const @ 0x0000000019e58ec1
E 12. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::execute() @ 0x0000000019e6234e
E 13. DB::MergeTask::execute() @ 0x0000000019e6bd62
E 14. DB::MergePlainMergeTreeTask::executeStep() @ 0x0000000019e4a0bf
E 15. DB::StorageMergeTree::merge(bool, String const&, bool, bool, std::vector<String, std::allocator<String>> const&, bool, std::shared_ptr<DB::MergeTreeTransaction> const&, PreformattedMessage&, bool) @ 0x0000000019812b46
E 16. DB::StorageMergeTree::optimize(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::shared_ptr<DB::IAST> const&, bool, bool, std::vector<String, std::allocator<String>> const&, bool, std::shared_ptr<DB::Context const>) @ 0x000000001981cfd1
E 17. DB::InterpreterOptimizeQuery::execute() @ 0x000000001904fbf7
E 18. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, std::unique_ptr<DB::ReadBuffer, std::default_delete<DB::ReadBuffer>>&, std::shared_ptr<DB::IAST>&, std::shared_ptr<DB::ImplicitTransactionControlExecutor>, std::function<void ()>) @ 0x0000000018fd0304
E 19. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x0000000018fca133
E 20. DB::TCPHandler::runImpl() @ 0x000000001a77a8b7
E 21. DB::TCPHandler::run() @ 0x000000001a79ce99
E 22. Poco::Net::TCPServerConnection::start() @ 0x000000001f86f407
E 23. Poco::Net::TCPServerDispatcher::run() @ 0x000000001f86f899
E 24. Poco::PooledThread::run() @ 0x000000001f836047
E 25. Poco::ThreadImpl::runnableEntry(void*) @ 0x000000001f834441
E 26. ? @ 0x0000000000094ac3
E 27. ? @ 0x00000000001268c0
E . (NOT_FOUND_COLUMN_IN_BLOCK)
E (query: OPTIMIZE TABLE test.graphite PARTITION 200109 FINAL)
You can launch it locally as python -m ci.praktika run integration --test test_graphite_merge_tree/test.py::test_ttl_version from the repo root, the clickhouse binary could be placed in ci/tmp
I wrote a test, unfortunately, it fails as following:
Thanks! The test passes now.
I've updated the PR to address https://github.com/ClickHouse/ClickHouse/pull/37570 in a more general way, which also satisfies the second requirement of #4968.
Now it exposes additional issues: dependencies in default expressions are not handled correctly during column TTL. I’ll work on a proper fix.
Handling missing columns that have default expressions is non-trivial and currently unresolved (see https://github.com/ClickHouse/ClickHouse/issues/91127). For now, we conservatively avoid expiring such columns.
PR is ready. @jkartseva @Felixoid Could you help take another look?
Thank you, will take a look in the morning.
I can only approve that the original issue that we found in https://github.com/ClickHouse/ClickHouse/issues/88002 is addressed, since my tests are OK :+1:
CH Inc sync - tests failed
@jkartseva It looks like some tests in the private repo are failing. This might be related to the behavioral change introduced in this PR, which is also reflected in the updated test reference files.
I don't see that the failure in private is related to this PR, it looks like an error in our CI infra. Asking around..
I don't see that the failure in private is related to this PR, it looks like an error in our CI infra. Asking around..
@jkartseva I’ve resolved the merge conflict, but it still shows the same private-repo failure. Could you please take another look?
@amosbird Private had merge conflicts with https://github.com/ClickHouse/ClickHouse/pull/91569, I resolved as following:
diff --cc src/Storages/MergeTree/MergeTask.cpp
index c2253639fc1,a0fb92a411f..b4a6c1e8f5d
--- a/src/Storages/MergeTree/MergeTask.cpp
+++ b/src/Storages/MergeTree/MergeTask.cpp
@@@ -292,29 -292,17 +292,26 @@@ void MergeTask::ExecuteAndFinalizeHoriz
key_columns.insert(String(Nested::getColumnFromSubcolumn(name, storage_columns)));
}
- /// Force sign column for Collapsing mode
- if (global_ctx->merging_params.mode == MergeTreeData::MergingParams::Collapsing)
+ /// Force sign column for Collapsing mode and VersionedCollapsing mode
+ if (!global_ctx->merging_params.sign_column.empty())
key_columns.emplace(global_ctx->merging_params.sign_column);
- /// Force version column for Replacing mode
- if (global_ctx->merging_params.mode == MergeTreeData::MergingParams::Replacing)
- {
+ /// Force is_deleted column for Replacing mode
+ if (!global_ctx->merging_params.is_deleted_column.empty())
key_columns.emplace(global_ctx->merging_params.is_deleted_column);
- key_columns.emplace(global_ctx->merging_params.version_column);
- }
+ /// Force all columns params of Graphite mode.
+ if (global_ctx->merging_params.mode == MergeTreeData::MergingParams::Graphite)
+ {
+ key_columns.emplace(global_ctx->merging_params.graphite_params.path_column_name);
+ key_columns.emplace(global_ctx->merging_params.graphite_params.time_column_name);
+ key_columns.emplace(global_ctx->merging_params.graphite_params.value_column_name);
+ key_columns.emplace(global_ctx->merging_params.graphite_params.version_column_name);
+ }
+
- /// Force sign column for VersionedCollapsing mode. Version is already in primary key.
- if (global_ctx->merging_params.mode == MergeTreeData::MergingParams::VersionedCollapsing)
- key_columns.emplace(global_ctx->merging_params.sign_column);
+ /// Force version column for Replacing mode and VersionedCollapsing mode
+ if (!global_ctx->merging_params.version_column.empty())
+ key_columns.emplace(global_ctx->merging_params.version_column);
/// Force to merge at least one column in case of empty key
if (key_columns.empty())
Also, sync did not went through for your last two commits.
I manually resolved the conflicts; the sync went through.
And now arm_tidy build is failing in private, but it's not related to this PR
And now arm_tidy build is failing in private, but it's not related to this PR
Cool. Is there any other blocker to merge this?
The test test_graphite_merge_tree is in the skip list in private repository, which is why the asan_flaky check can't start properly, and this blocks private sync.
The test test_graphite_merge_tree is in the skip list in private repository, which is why the asan_flaky check can't start properly, and this blocks private sync.
Is this something we can unblock? It looks like the CH Inc sync test is still stuck in “pending”.