Bug: Migration failures with distributed ClickHouse cluster using SENTRY_DISTRIBUTED_CLICKHOUSE_TABLES
Summary:
When attempting to perform Snuba migrations on a distributed ClickHouse cluster with the following configuration (in settings.py), migrations fail unexpectedly. The issue is reproducible with Snuba version 25.7.0.
Snuba version: 25.7.0
Configuration:
import os
from snuba.settings import *
env = os.environ.get
DEBUG = env("DEBUG", "0").lower() in ("1", "true")
SENTRY_DISTRIBUTED_CLICKHOUSE_TABLES = True
MIGRATIONS_LOCK_TIMEOUT = int(env("MIGRATIONS_LOCK_TIMEOUT", "600"))
MIGRATIONS_BATCH_SIZE = int(env("MIGRATIONS_BATCH_SIZE", "1"))
CLICKHOUSE_MUTATIONS_SYNC = int(env("CLICKHOUSE_MUTATIONS_SYNC", "1"))
CLICKHOUSE_ALTER_SYNC = int(env("CLICKHOUSE_ALTER_SYNC", "1"))
CLICKHOUSE_REPLICATION_ALTER_PARTITIONS_SYNC = int(env("CLICKHOUSE_REPLICATION_ALTER_PARTITIONS_SYNC", "2"))
CLUSTERS = [
{
"host": env("CLICKHOUSE_HOST", "sentry-clickhouse-headless"),
"port": int(9000),
"secure": env("CLICKHOUSE_SECURE", False),
"ca_certs": env("CLICKHOUSE_CA_CERTS", None),
"verify": env("CLICKHOUSE_VERIFY", False),
"user": env("CLICKHOUSE_USER", "default"),
"password": env("CLICKHOUSE_PASSWORD", ""),
"max_connections": int(os.environ.get("CLICKHOUSE_MAX_CONNECTIONS", 100)),
"database": env("CLICKHOUSE_DATABASE", "default"),
"http_port": 8123,
"storage_sets": {
"cdc",
"discover",
"eap_items",
"events",
"events_ro",
"metrics",
"migrations",
"outcomes",
"querylog",
"sessions",
"transactions",
"profiles",
"functions",
"replays",
"generic_metrics_sets",
"generic_metrics_distributions",
"search_issues",
"generic_metrics_counters",
"spans",
"events_analytics_platform",
"group_attributes",
"generic_metrics_gauges",
"metrics_summaries",
"profile_chunks",
},
"single_node": False,
"cluster_name": "default",
"distributed_cluster_name": "default",
},
]
REDIS_HOST = "sentry-sentry-redis-master"
REDIS_PORT = 6379
REDIS_PASSWORD = env("REDIS_PASSWORD", "")
REDIS_DB = int(env("REDIS_DB", 1))
Steps to Reproduce:
- Set up a distributed ClickHouse cluster.
- Configure Snuba as above, ensuring
SENTRY_DISTRIBUTED_CLICKHOUSE_TABLES = True. - Run Snuba migrations.
Expected Behavior: Migrations should complete successfully using distributed tables.
Actual Behavior: Migrations fail. (Please specify error messages and ClickHouse version if available.)
Environment:
- Snuba: 25.7.0
- ClickHouse: 23.12.3
The log of the error :
2025-07-31T13:29:26.490163562Z 2025-07-31 13:29:26,490 Block "" send time: 0.000038
2025-07-31T13:29:26.508136781Z {"module": "snuba.migrations.operations", "event": "Failed to execute operation on StorageSetKey.EVENTS_ANALYTICS_PLATFORM, target: OperationTarget.LOCAL\nCREATE MATERIALIZED VIEW IF NOT EXISTS spans_num_attrs_mv TO spans_num_attrs_local (organization_id UInt64, trace_id UUID, project_id UInt64, attr_key String, attr_value Float64, timestamp DateTime CODEC (ZSTD(1)), retention_days UInt16, duration_ms SimpleAggregateFunction(max, UInt32), count SimpleAggregateFunction(sum, UInt64)) AS \nSELECT\n organization_id,\n project_id,\n trace_id,\n attrs.1 as attr_key,\n attrs.2 as attr_value,\n toStartOfDay(_sort_timestamp) AS timestamp,\n retention_days,\n 1 AS count,\n maxSimpleState(duration_ms)\nFROM eap_spans_local\nLEFT ARRAY JOIN\n arrayConcat(CAST(attr_num_0, 'Array(Tuple(String, Float64))'),CAST(attr_num_1, 'Array(Tuple(String, Float64))'),CAST(attr_num_2, 'Array(Tuple(String, Float64))'),CAST(attr_num_3, 'Array(Tuple(String, Float64))'),CAST(attr_num_4, 'Array(Tuple(String, Float64))'),CAST(attr_num_5, 'Array(Tuple(String, Float64))'),CAST(attr_num_6, 'Array(Tuple(String, Float64))'),CAST(attr_num_7, 'Array(Tuple(String, Float64))'),CAST(attr_num_8, 'Array(Tuple(String, Float64))'),CAST(attr_num_9, 'Array(Tuple(String, Float64))'),CAST(attr_num_10, 'Array(Tuple(String, Float64))'),CAST(attr_num_11, 'Array(Tuple(String, Float64))'),CAST(attr_num_12, 'Array(Tuple(String, Float64))'),CAST(attr_num_13, 'Array(Tuple(String, Float64))'),CAST(attr_num_14, 'Array(Tuple(String, Float64))'),CAST(attr_num_15, 'Array(Tuple(String, Float64))'),CAST(attr_num_16, 'Array(Tuple(String, Float64))'),CAST(attr_num_17, 'Array(Tuple(String, Float64))'),CAST(attr_num_18, 'Array(Tuple(String, Float64))'),CAST(attr_num_19, 'Array(Tuple(String, Float64))')) AS attrs\nGROUP BY\n organization_id,\n project_id,\n trace_id,\n attr_key,\n attr_value,\n timestamp,\n retention_days\n;\nNone", "severity": "error", "exception": "Traceback (most recent call last):\n File \"/usr/src/snuba/snuba/clickhouse/native.py\", line 208, in execute\n result_data = query_execute()\n ^^^^^^^^^^^^^^^\n File \"/usr/src/snuba/snuba/clickhouse/native.py\", line 191, in query_execute\n return conn.execute( # type: ignore\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py\", line 382, in execute\n rv = self.process_ordinary_query(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py\", line 580, in process_ordinary_query\n return self.receive_result(with_column_types=with_column_types,\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/local/lib/python3.11/site-packages/sentry_sdk/integrations/clickhouse_driver.py\", line 112, in _inner_end\n res = f(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^\n File \"/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py\", line 212, in receive_result\n return result.get_result()\n ^^^^^^^^^^^^^^^^^^^\n File \"/usr/local/lib/python3.11/site-packages/clickhouse_driver/result.py\", line 50, in get_result\n for packet in self.packet_generator:\n File \"/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py\", line 228, in packet_generator\n packet = self.receive_packet()\n ^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py\", line 245, in receive_packet\n raise packet.exception\nclickhouse_driver.errors.ServerException: Code: 8.\nDB::Exception: SELECT query outputs column with name 'maxSimpleState(duration_ms)', which is not found in the target table. Use 'AS' to assign alias that matches a column name. Stack trace:\n\n0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000fbf62bb\n1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000009d94a8c\n2. DB::Exception::Exception<String const&>(int, FormatStringHelperImpl<std::type_identity<String const&>::type>, String const&) @ 0x000000000a6bb3ab\n3. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&) @ 0x0000000013d92877\n4. DB::InterpreterCreateQuery::execute() @ 0x0000000013da4a18\n5. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::shared_ptr<DB::IAST>&) @ 0x00000000141b2f29\n6. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x00000000141ac45a\n7. DB::TCPHandler::runImpl() @ 0x000000001559a6cc\n8. DB::TCPHandler::run() @ 0x00000000155b9ff8\n9. Poco::Net::TCPServerConnection::start() @ 0x0000000018d065a7\n10. Poco::Net::TCPServerDispatcher::run() @ 0x0000000018d069f9\n11. Poco::PooledThread::run() @ 0x0000000018cd19fb\n12. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000018ccfedd\n13. ? @ 0x00000000000891f5\n14. ? @ 0x0000000000108b00\n\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/usr/src/snuba/snuba/migrations/operations.py\", line 101, in execute\n connection.execute(self.format_sql(), settings=self._settings)\n File \"/usr/src/snuba/snuba/clickhouse/native.py\", line 293, in execute\n raise ClickhouseError(e.message, code=e.code) from e\nsnuba.clickhouse.errors.ClickhouseError: DB::Exception: SELECT query outputs column with name 'maxSimpleState(duration_ms)', which is not found in the target table. Use 'AS' to assign alias that matches a column name. Stack trace:\n\n0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000fbf62bb\n1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000009d94a8c\n2. DB::Exception::Exception<String const&>(int, FormatStringHelperImpl<std::type_identity<String const&>::type>, String const&) @ 0x000000000a6bb3ab\n3. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&) @ 0x0000000013d92877\n4. DB::InterpreterCreateQuery::execute() @ 0x0000000013da4a18\n5. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::shared_ptr<DB::IAST>&) @ 0x00000000141b2f29\n6. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x00000000141ac45a\n7. DB::TCPHandler::runImpl() @ 0x000000001559a6cc\n8. DB::TCPHandler::run() @ 0x00000000155b9ff8\n9. Poco::Net::TCPServerConnection::start() @ 0x0000000018d065a7\n10. Poco::Net::TCPServerDispatcher::run() @ 0x0000000018d069f9\n11. Poco::PooledThread::run() @ 0x0000000018cd19fb\n12. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000018ccfedd\n13. ? @ 0x00000000000891f5\n14. ? @ 0x0000000000108b00\n", "timestamp": "2025-07-31T13:29:26.495004Z"}
2025-07-31T13:29:26.526909388Z Traceback (most recent call last):
2025-07-31T13:29:26.526916748Z File "/usr/src/snuba/snuba/clickhouse/native.py", line 208, in execute
2025-07-31T13:29:26.527002877Z result_data = query_execute()
2025-07-31T13:29:26.527100225Z ^^^^^^^^^^^^^^^
2025-07-31T13:29:26.527103755Z File "/usr/src/snuba/snuba/clickhouse/native.py", line 191, in query_execute
2025-07-31T13:29:26.527131185Z return conn.execute( # type: ignore
2025-07-31T13:29:26.527209464Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-31T13:29:26.527213304Z File "/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py", line 382, in execute
2025-07-31T13:29:26.527251683Z rv = self.process_ordinary_query(
2025-07-31T13:29:26.527292862Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-31T13:29:26.527299062Z File "/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py", line 580, in process_ordinary_query
2025-07-31T13:29:26.527414121Z return self.receive_result(with_column_types=with_column_types,
2025-07-31T13:29:26.527463260Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-31T13:29:26.527478300Z File "/usr/local/lib/python3.11/site-packages/sentry_sdk/integrations/clickhouse_driver.py", line 112, in _inner_end
2025-07-31T13:29:26.527522519Z res = f(*args, **kwargs)
2025-07-31T13:29:26.527551469Z ^^^^^^^^^^^^^^^^^^
2025-07-31T13:29:26.527560989Z File "/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py", line 212, in receive_result
2025-07-31T13:29:26.527633077Z return result.get_result()
2025-07-31T13:29:26.527678867Z ^^^^^^^^^^^^^^^^^^^
2025-07-31T13:29:26.527685007Z File "/usr/local/lib/python3.11/site-packages/clickhouse_driver/result.py", line 50, in get_result
2025-07-31T13:29:26.527698267Z for packet in self.packet_generator:
2025-07-31T13:29:26.527701247Z File "/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py", line 228, in packet_generator
2025-07-31T13:29:26.527784965Z packet = self.receive_packet()
2025-07-31T13:29:26.527813015Z ^^^^^^^^^^^^^^^^^^^^^
2025-07-31T13:29:26.527817065Z File "/usr/local/lib/python3.11/site-packages/clickhouse_driver/client.py", line 245, in receive_packet
2025-07-31T13:29:26.527892214Z raise packet.exception
2025-07-31T13:29:26.527895604Z clickhouse_driver.errors.ServerException: Code: 8.
2025-07-31T13:29:26.527898984Z DB::Exception: SELECT query outputs column with name 'maxSimpleState(duration_ms)', which is not found in the target table. Use 'AS' to assign alias that matches a column name. Stack trace:
2025-07-31T13:29:26.527901654Z
2025-07-31T13:29:26.527904734Z 0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000fbf62bb
2025-07-31T13:29:26.527907534Z 1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000009d94a8c
2025-07-31T13:29:26.527910643Z 2. DB::Exception::Exception<String const&>(int, FormatStringHelperImpl<std::type_identity<String const&>::type>, String const&) @ 0x000000000a6bb3ab
2025-07-31T13:29:26.527913243Z 3. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&) @ 0x0000000013d92877
2025-07-31T13:29:26.527915893Z 4. DB::InterpreterCreateQuery::execute() @ 0x0000000013da4a18
2025-07-31T13:29:26.527918743Z 5. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::shared_ptr<DB::IAST>&) @ 0x00000000141b2f29
2025-07-31T13:29:26.527921693Z 6. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x00000000141ac45a
2025-07-31T13:29:26.527924303Z 7. DB::TCPHandler::runImpl() @ 0x000000001559a6cc
2025-07-31T13:29:26.527929413Z 8. DB::TCPHandler::run() @ 0x00000000155b9ff8
2025-07-31T13:29:26.527932033Z 9. Poco::Net::TCPServerConnection::start() @ 0x0000000018d065a7
2025-07-31T13:29:26.527934623Z 10. Poco::Net::TCPServerDispatcher::run() @ 0x0000000018d069f9
2025-07-31T13:29:26.527937313Z 11. Poco::PooledThread::run() @ 0x0000000018cd19fb
2025-07-31T13:29:26.527939973Z 12. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000018ccfedd
2025-07-31T13:29:26.527942573Z 13. ? @ 0x00000000000891f5
2025-07-31T13:29:26.527945123Z 14. ? @ 0x0000000000108b00
2025-07-31T13:29:26.527947543Z
2025-07-31T13:29:26.527950473Z
2025-07-31T13:29:26.527953213Z The above exception was the direct cause of the following exception:
2025-07-31T13:29:26.527955673Z
2025-07-31T13:29:26.527958593Z Traceback (most recent call last):
2025-07-31T13:29:26.527961213Z File "/usr/local/bin/snuba", line 33, in <module>
2025-07-31T13:29:26.527970843Z sys.exit(load_entry_point('snuba', 'console_scripts', 'snuba')())
2025-07-31T13:29:26.528003502Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-31T13:29:26.528015782Z File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
2025-07-31T13:29:26.528156470Z return self.main(*args, **kwargs)
2025-07-31T13:29:26.528200169Z ^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-31T13:29:26.528203469Z File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
2025-07-31T13:29:26.528333297Z rv = self.invoke(ctx)
2025-07-31T13:29:26.528365707Z ^^^^^^^^^^^^^^^^
2025-07-31T13:29:26.528368657Z File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
2025-07-31T13:29:26.528549664Z return _process_result(sub_ctx.command.invoke(sub_ctx))
2025-07-31T13:29:26.528599233Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-31T13:29:26.528621883Z File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
2025-07-31T13:29:26.528779681Z return _process_result(sub_ctx.command.invoke(sub_ctx))
2025-07-31T13:29:26.528832670Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-31T13:29:26.528872779Z File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
2025-07-31T13:29:26.528990728Z return ctx.invoke(self.callback, **ctx.params)
2025-07-31T13:29:26.529044097Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-31T13:29:26.529054287Z File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
2025-07-31T13:29:26.529149975Z return __callback(*args, **kwargs)
2025-07-31T13:29:26.529186485Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-07-31T13:29:26.529189705Z File "/usr/src/snuba/snuba/cli/migrations.py", line 115, in migrate
2025-07-31T13:29:26.529248874Z runner.run_all(
2025-07-31T13:29:26.529251934Z File "/usr/src/snuba/snuba/migrations/runner.py", line 255, in run_all
2025-07-31T13:29:26.529288153Z self._run_migration_impl(
2025-07-31T13:29:26.529313733Z File "/usr/src/snuba/snuba/migrations/runner.py", line 338, in _run_migration_impl
2025-07-31T13:29:26.529368202Z migration.forwards(context, dry_run, columns_states)
2025-07-31T13:29:26.529371332Z File "/usr/src/snuba/snuba/migrations/migration.py", line 170, in forwards
2025-07-31T13:29:26.529455781Z op.execute()
2025-07-31T13:29:26.529459541Z File "/usr/src/snuba/snuba/migrations/operations.py", line 101, in execute
2025-07-31T13:29:26.529462331Z connection.execute(self.format_sql(), settings=self._settings)
2025-07-31T13:29:26.529492260Z File "/usr/src/snuba/snuba/clickhouse/native.py", line 293, in execute
2025-07-31T13:29:26.529546880Z raise ClickhouseError(e.message, code=e.code) from e
2025-07-31T13:29:26.529550500Z snuba.clickhouse.errors.ClickhouseError: DB::Exception: SELECT query outputs column with name 'maxSimpleState(duration_ms)', which is not found in the target table. Use 'AS' to assign alias that matches a column name. Stack trace:
2025-07-31T13:29:26.529553260Z
2025-07-31T13:29:26.529556320Z 0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000fbf62bb
2025-07-31T13:29:26.529558939Z 1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000009d94a8c
2025-07-31T13:29:26.529562049Z 2. DB::Exception::Exception<String const&>(int, FormatStringHelperImpl<std::type_identity<String const&>::type>, String const&) @ 0x000000000a6bb3ab
2025-07-31T13:29:26.529564559Z 3. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&) @ 0x0000000013d92877
2025-07-31T13:29:26.529567249Z 4. DB::InterpreterCreateQuery::execute() @ 0x0000000013da4a18
2025-07-31T13:29:26.529570389Z 5. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::shared_ptr<DB::IAST>&) @ 0x00000000141b2f29
2025-07-31T13:29:26.529583719Z 6. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x00000000141ac45a
2025-07-31T13:29:26.529586589Z 7. DB::TCPHandler::runImpl() @ 0x000000001559a6cc
2025-07-31T13:29:26.529589539Z 8. DB::TCPHandler::run() @ 0x00000000155b9ff8
2025-07-31T13:29:26.529592179Z 9. Poco::Net::TCPServerConnection::start() @ 0x0000000018d065a7
2025-07-31T13:29:26.529594809Z 10. Poco::Net::TCPServerDispatcher::run() @ 0x0000000018d069f9
2025-07-31T13:29:26.529597289Z 11. Poco::PooledThread::run() @ 0x0000000018cd19fb
2025-07-31T13:29:26.529599929Z 12. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000018ccfedd
2025-07-31T13:29:26.529602739Z 13. ? @ 0x00000000000891f5
2025-07-31T13:29:26.529605279Z 14. ? @ 0x0000000000108b00
2025-07-31T13:29:26.529613709Z
I think I've run into the same problem with a self-hosted Sentry setup and a separate Clickhouse instance. It seems the problem is caused by a change in Clickhouse 25.4. At least in my case using the Clickhouse version used by the Sentry self-hosted setup (25.3) fixed the problem.