ClickHouse icon indicating copy to clipboard operation
ClickHouse copied to clipboard

Fix premature TTL column removal causing merge failures and wrong defaults

Open amosbird opened this issue 2 months ago • 14 comments

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed several issues caused by premature column removal in TTL. Resolves #88002

Documentation entry for user-facing changes

  • [ ] Documentation is written (mandatory for new features)

Details

  1. Fix merge failures in GraphiteMergeTree, VersionedCollapsingMergeTree, ReplacingMergeTree, and AggregatingMergeTree when TTL-removed columns were still required by merge logic
  2. Fix incorrect skip index or data values caused by filling missing columns with type defaults instead of column DEFAULT expressions
  3. Fix missing OPTIMIZE ... DEDUPLICATE BY behavior when deduplication columns were removed by TTL

amosbird avatar Oct 21 '25 13:10 amosbird

Workflow [PR], commit [3deb13e1]

Summary:

job_name test_name status info comment
Stateless tests (arm_asan, targeted) failure
01459_manual_write_to_replicas_quorum_detach_attach FAIL cidb, issue
BuzzHouse (amd_debug) failure
Logical error: 'Inconsistent AST formatting: the query: FAIL cidb
BuzzHouse (amd_msan) failure
Unknown error FAIL cidb

clickhouse-gh[bot] avatar Oct 21 '25 13:10 clickhouse-gh[bot]

Looks good. Could you please add a test for one of the engines mentioned in the changelog?

Sure. I've added more tests and fixed more issues.

amosbird avatar Oct 28 '25 09:10 amosbird

Thanks for the PR!

Looks like the tests has failed, unfortunately =\

Felixoid avatar Nov 03 '25 08:11 Felixoid

Looks like the tests has failed, unfortunately =\

It should be fine now.

amosbird avatar Nov 13 '25 07:11 amosbird

I wrote a test, unfortunately, it fails as following:

=================================== FAILURES ===================================
_______________________________ test_ttl_version _______________________________
[gw0] linux -- Python 3.10.12 /usr/bin/python3
test_graphite_merge_tree/test.py:534: in test_ttl_version
    q("OPTIMIZE TABLE test.graphite PARTITION 200109 FINAL")
helpers/cluster.py:4255: in query
    return self.client.query(
helpers/client.py:39: in wrap
    return func(self, *args, **kwargs)
helpers/client.py:79: in query
    ).get_answer()
helpers/client.py:247: in get_answer
    raise QueryRuntimeException(
E   helpers.client.QueryRuntimeException: Client failed! Return code: 10, stderr: Received exception from server (version 25.11.1):
E   Code: 10. DB::Exception: Received from 172.16.1.2:9000. DB::Exception: Not found column updated in block. There are only columns: metric, timestamp, value, date. Stack trace:
E
E   0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000013eab75f
E   1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000cb63fce
E   2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000cb63a80
E   3. DB::Exception::Exception<String const&, String>(int, FormatStringHelperImpl<std::type_identity<String const&>::type, std::type_identity<String>::type>, String const&, String&&) @ 0x000000000dc115ab
E   4. DB::Block::getPositionByName(String const&, bool) const @ 0x0000000017681de4
E   5. DB::GraphiteRollupSortedAlgorithm::GraphiteRollupSortedAlgorithm(std::shared_ptr<DB::Block const>, unsigned long, DB::SortDescription, unsigned long, unsigned long, std::optional<unsigned long>, DB::Graphite::Params, long) @ 0x000000001ac60ef6
E   6. DB::IMergingTransform<DB::GraphiteRollupSortedAlgorithm>::IMergingTransform<std::shared_ptr<DB::Block const>&, unsigned long&, DB::SortDescription, unsigned long&, unsigned long&, std::optional<unsigned long>&, DB::Graphite::Params, long&>(unsigned long, std::shared_ptr<DB::Block const>, std::shared_ptr<DB::Block const>, bool, unsigned long, bool, std::shared_ptr<DB::Block const>&, unsigned long&, DB::SortDescription&&, unsigned long&, unsigned long&, std::optional<unsigned long>&, DB::Graphite::Params&&, long&) @ 0x0000000019e82d83
E   7. DB::MergePartsStep::transformPipeline(DB::QueryPipelineBuilder&, DB::BuildQueryPipelineSettings const&) @ 0x0000000019e762c6
E   8. DB::ITransformingStep::updatePipeline(std::vector<std::unique_ptr<DB::QueryPipelineBuilder, std::default_delete<DB::QueryPipelineBuilder>>, std::allocator<std::unique_ptr<DB::QueryPipelineBuilder, std::default_delete<DB::QueryPipelineBuilder>>>>, DB::BuildQueryPipelineSettings const&) @ 0x000000001ad51f57
E   9. DB::QueryPlan::buildQueryPipeline(DB::QueryPlanOptimizationSettings const&, DB::BuildQueryPipelineSettings const&, bool) @ 0x000000001ad96d80
E   10. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::createMergedStream() const @ 0x0000000019e6020d
E   11. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::prepare() const @ 0x0000000019e58ec1
E   12. DB::MergeTask::ExecuteAndFinalizeHorizontalPart::execute() @ 0x0000000019e6234e
E   13. DB::MergeTask::execute() @ 0x0000000019e6bd62
E   14. DB::MergePlainMergeTreeTask::executeStep() @ 0x0000000019e4a0bf
E   15. DB::StorageMergeTree::merge(bool, String const&, bool, bool, std::vector<String, std::allocator<String>> const&, bool, std::shared_ptr<DB::MergeTreeTransaction> const&, PreformattedMessage&, bool) @ 0x0000000019812b46
E   16. DB::StorageMergeTree::optimize(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::shared_ptr<DB::IAST> const&, bool, bool, std::vector<String, std::allocator<String>> const&, bool, std::shared_ptr<DB::Context const>) @ 0x000000001981cfd1
E   17. DB::InterpreterOptimizeQuery::execute() @ 0x000000001904fbf7
E   18. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, std::unique_ptr<DB::ReadBuffer, std::default_delete<DB::ReadBuffer>>&, std::shared_ptr<DB::IAST>&, std::shared_ptr<DB::ImplicitTransactionControlExecutor>, std::function<void ()>) @ 0x0000000018fd0304
E   19. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x0000000018fca133
E   20. DB::TCPHandler::runImpl() @ 0x000000001a77a8b7
E   21. DB::TCPHandler::run() @ 0x000000001a79ce99
E   22. Poco::Net::TCPServerConnection::start() @ 0x000000001f86f407
E   23. Poco::Net::TCPServerDispatcher::run() @ 0x000000001f86f899
E   24. Poco::PooledThread::run() @ 0x000000001f836047
E   25. Poco::ThreadImpl::runnableEntry(void*) @ 0x000000001f834441
E   26. ? @ 0x0000000000094ac3
E   27. ? @ 0x00000000001268c0
E   . (NOT_FOUND_COLUMN_IN_BLOCK)
E   (query: OPTIMIZE TABLE test.graphite PARTITION 200109 FINAL)

You can launch it locally as python -m ci.praktika run integration --test test_graphite_merge_tree/test.py::test_ttl_version from the repo root, the clickhouse binary could be placed in ci/tmp

Felixoid avatar Nov 13 '25 23:11 Felixoid

I wrote a test, unfortunately, it fails as following:

Thanks! The test passes now.

amosbird avatar Nov 22 '25 18:11 amosbird

I've updated the PR to address https://github.com/ClickHouse/ClickHouse/pull/37570 in a more general way, which also satisfies the second requirement of #4968.

amosbird avatar Nov 27 '25 12:11 amosbird

Now it exposes additional issues: dependencies in default expressions are not handled correctly during column TTL. I’ll work on a proper fix.

amosbird avatar Nov 28 '25 09:11 amosbird

Handling missing columns that have default expressions is non-trivial and currently unresolved (see https://github.com/ClickHouse/ClickHouse/issues/91127). For now, we conservatively avoid expiring such columns.

PR is ready. @jkartseva @Felixoid Could you help take another look?

amosbird avatar Dec 03 '25 06:12 amosbird

Thank you, will take a look in the morning.

jkartseva avatar Dec 03 '25 06:12 jkartseva

I can only approve that the original issue that we found in https://github.com/ClickHouse/ClickHouse/issues/88002 is addressed, since my tests are OK :+1:

Felixoid avatar Dec 04 '25 19:12 Felixoid

CH Inc sync - tests failed

@jkartseva It looks like some tests in the private repo are failing. This might be related to the behavioral change introduced in this PR, which is also reflected in the updated test reference files.

amosbird avatar Dec 05 '25 04:12 amosbird

I don't see that the failure in private is related to this PR, it looks like an error in our CI infra. Asking around..

jkartseva avatar Dec 05 '25 22:12 jkartseva

I don't see that the failure in private is related to this PR, it looks like an error in our CI infra. Asking around..

@jkartseva I’ve resolved the merge conflict, but it still shows the same private-repo failure. Could you please take another look?

amosbird avatar Dec 10 '25 12:12 amosbird

@amosbird Private had merge conflicts with https://github.com/ClickHouse/ClickHouse/pull/91569, I resolved as following:

diff --cc src/Storages/MergeTree/MergeTask.cpp
index c2253639fc1,a0fb92a411f..b4a6c1e8f5d
--- a/src/Storages/MergeTree/MergeTask.cpp
+++ b/src/Storages/MergeTree/MergeTask.cpp
@@@ -292,29 -292,17 +292,26 @@@ void MergeTask::ExecuteAndFinalizeHoriz
              key_columns.insert(String(Nested::getColumnFromSubcolumn(name, storage_columns)));
      }

-     /// Force sign column for Collapsing mode
-     if (global_ctx->merging_params.mode == MergeTreeData::MergingParams::Collapsing)
+     /// Force sign column for Collapsing mode and VersionedCollapsing mode
+     if (!global_ctx->merging_params.sign_column.empty())
          key_columns.emplace(global_ctx->merging_params.sign_column);

-     /// Force version column for Replacing mode
-     if (global_ctx->merging_params.mode == MergeTreeData::MergingParams::Replacing)
-     {
+     /// Force is_deleted column for Replacing mode
+     if (!global_ctx->merging_params.is_deleted_column.empty())
          key_columns.emplace(global_ctx->merging_params.is_deleted_column);
-         key_columns.emplace(global_ctx->merging_params.version_column);
-     }

 +    /// Force all columns params of Graphite mode.
 +    if (global_ctx->merging_params.mode == MergeTreeData::MergingParams::Graphite)
 +    {
 +        key_columns.emplace(global_ctx->merging_params.graphite_params.path_column_name);
 +        key_columns.emplace(global_ctx->merging_params.graphite_params.time_column_name);
 +        key_columns.emplace(global_ctx->merging_params.graphite_params.value_column_name);
 +        key_columns.emplace(global_ctx->merging_params.graphite_params.version_column_name);
 +    }
 +
-     /// Force sign column for VersionedCollapsing mode. Version is already in primary key.
-     if (global_ctx->merging_params.mode == MergeTreeData::MergingParams::VersionedCollapsing)
-         key_columns.emplace(global_ctx->merging_params.sign_column);
+     /// Force version column for Replacing mode and VersionedCollapsing mode
+     if (!global_ctx->merging_params.version_column.empty())
+         key_columns.emplace(global_ctx->merging_params.version_column);

      /// Force to merge at least one column in case of empty key
      if (key_columns.empty())

jkartseva avatar Dec 11 '25 02:12 jkartseva

Also, sync did not went through for your last two commits.

jkartseva avatar Dec 11 '25 02:12 jkartseva

I manually resolved the conflicts; the sync went through.

jkartseva avatar Dec 11 '25 02:12 jkartseva

And now arm_tidy build is failing in private, but it's not related to this PR

jkartseva avatar Dec 11 '25 06:12 jkartseva

And now arm_tidy build is failing in private, but it's not related to this PR

Cool. Is there any other blocker to merge this?

amosbird avatar Dec 11 '25 07:12 amosbird

The test test_graphite_merge_tree is in the skip list in private repository, which is why the asan_flaky check can't start properly, and this blocks private sync.

jkartseva avatar Dec 12 '25 01:12 jkartseva

The test test_graphite_merge_tree is in the skip list in private repository, which is why the asan_flaky check can't start properly, and this blocks private sync.

Is this something we can unblock? It looks like the CH Inc sync test is still stuck in “pending”.

amosbird avatar Dec 12 '25 11:12 amosbird