Invalid setting when creating temporary tables with `ON CLUSTER`
Hi!
When trying to create a temporary table with the cluster property set, creating temporary tables with Engine=Memory passes the replicated_deduplication_window setting which is invalid. It seems that the replicated_deduplication_window setting only checks for the materialization type but does not check the Engine which it should.
I think that the update_model_settings method should accept engine type and only allow settings that are available for each engine.
Seeing this issue when trying to run https://github.com/elementary-data/dbt-data-reliability because it tries to create a temporary table when artifacts change.
Steps to reproduce
- Setup a clickhouse multi-node cluster.
- Configure with `cluster: "{cluster}"' in profiles
- Run the elementary bootstrap model
dbt run -m elementary
Expected behaviour
No errors should be expected. The temporary table should be created in the connected node in memory.
Code examples, such as models or profile settings
# profiles.yml
project_name:
target: local
outputs:
local:
type: clickhouse
driver: native
schema: "{{ env_var('DBT_SCHEMA') }}"
user: "{{ env_var('DBT_USER') }}"
password: "{{ env_var('DBT_PASS') }}"
port: 9000
cluster: "{cluster}"
cluster_mode: true
check_exchange: false
# packages.yml
packages:
- package: dbt-labs/dbt_utils
version: 1.2.0
packages:
- package: elementary-data/elementary
version: 0.15.2
# dbt_project.yml
name: 'project_name'
version: '1.0.0'
profile: 'project_name'
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
docs-paths: ["docs"]
clean-targets:
- "target"
- "dbt_packages"
dispatch:
- macro_namespace: elementary
search_order: ['project_name', 'elementary']
models:
+engine: ReplicatedMergeTree
elementary:
+engine: ReplicatedMergeTree
edr:
system:
enabled: false
data_monitoring:
enabled: false # Doesn't yet work because stddev is not available and cannot be overridden.
seeds:
+engine: ReplicatedMergeTree
flags:
require_explicit_package_overrides_for_builtin_materializations: false
source_freshness_run_project_hooks: true
# macros/current_timestamp.sql
{% macro clickhouse__edr_current_timestamp() -%}
now(timezone())
{%- endmacro -%}
{% macro clickhouse__edr_current_timestamp_in_utc() -%}
now()
{%- endmacro -%}
dbt and/or ClickHouse server logs
[0m17:24:58.632265 [debug] [Thread-1 (]: dbt_clickhouse adapter: Error running SQL: /* {"app": "dbt", "dbt_version": "1.8.6", "profile_name": "opti_analytics", "target_name": "local", "node_id": "model.elementary.dbt_models"} */
create temporary table `dbt_models__tmp_20241010152458544783`
engine Memory
-- end_of_sql
SETTINGS replicated_deduplication_window=0
as (
SELECT
*
FROM `project_name`.`dbt_models`
WHERE 1 = 0
)
[0m17:24:58.645419 [debug] [Thread-1 (]: Database Error in model dbt_models (models/edr/dbt_artifacts/dbt_models.sql)
Code: 115.
DB::Exception: Unknown setting 'replicated_deduplication_window': for storage Memory. Stack trace:
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000bc3f2c8
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000007a8dc7c
2. DB::Exception::Exception<String>(int, FormatStringHelperImpl<std::type_identity<String>::type>, String&&) @ 0x0000000007a8d858
3. DB::BaseSettingsHelpers::throwSettingNotFound(std::basic_string_view<char, std::char_traits<char>>) @ 0x000000000ea38e4c
4. DB::BaseSettings<DB::memorySettingsTraits>::set(std::basic_string_view<char, std::char_traits<char>>, DB::Field const&) @ 0x000000000f1442c4
5. DB::MemorySettings::loadFromQuery(DB::ASTStorage&) @ 0x000000000ff94e40
6. std::shared_ptr<DB::IStorage> std::__function::__policy_invoker<std::shared_ptr<DB::IStorage> (DB::StorageFactory::Arguments const&)>::__call_impl<std::__function::__default_alloc_func<DB::registerStorageMemory(DB::StorageFactory&)::$_0, std::shared_ptr<DB::IStorage> (DB::StorageFactory::Arguments const&)>>(std::__function::__policy_storage const*, DB::StorageFactory::Arguments const&) (.llvm.1794116773610321499) @ 0x000000001008a0f8
7. DB::StorageFactory::get(DB::ASTCreateQuery const&, String const&, std::shared_ptr<DB::Context>, std::shared_ptr<DB::Context>, DB::ColumnsDescription const&, DB::ConstraintsDescription const&, DB::LoadingStrictnessLevel) const @ 0x000000000fffac7c
8. std::shared_ptr<DB::IStorage> std::__function::__policy_invoker<std::shared_ptr<DB::IStorage> (DB::StorageID const&)>::__call_impl<std::__function::__default_alloc_func<DB::InterpreterCreateQuery::doCreateTable(DB::ASTCreateQuery&, DB::InterpreterCreateQuery::TableProperties const&, std::unique_ptr<DB::DDLGuard, std::default_delete<DB::DDLGuard>>&, DB::LoadingStrictnessLevel)::$_0, std::shared_ptr<DB::IStorage> (DB::StorageID const&)>>(std::__function::__policy_storage const*, DB::StorageID const&) @ 0x000000000f609ac4
9. DB::TemporaryTableHolder::TemporaryTableHolder(std::shared_ptr<DB::Context const>, std::function<std::shared_ptr<DB::IStorage> (DB::StorageID const&)> const&, std::shared_ptr<DB::IAST> const&) @ 0x000000000f12ed98
10. DB::InterpreterCreateQuery::doCreateTable(DB::ASTCreateQuery&, DB::InterpreterCreateQuery::TableProperties const&, std::unique_ptr<DB::DDLGuard, std::default_delete<DB::DDLGuard>>&, DB::LoadingStrictnessLevel) @ 0x000000000f6011b0
11. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&) @ 0x000000000f5fbeec
12. DB::InterpreterCreateQuery::execute() @ 0x000000000f607660
13. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x000000000fbf0d58
14. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x000000000fbed2a8
15. DB::TCPHandler::runImpl() @ 0x0000000010a02904
16. DB::TCPHandler::run() @ 0x0000000010a19348
17. Poco::Net::TCPServerConnection::start() @ 0x0000000012d464b8
18. Poco::Net::TCPServerDispatcher::run() @ 0x0000000012d46994
19. Poco::PooledThread::run() @ 0x0000000012e3e718
20. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000012e3cb10
21. start_thread @ 0x0000000000007624
22. ? @ 0x00000000000d162c
Configuration
Environment
- dbt version: 1.8.6
- dbt-clickhouse version: 1.8.7
- clickhouse-driver version (if using native): 24.4.1.914
- Python version: Python 3.11.10
- Operating system: Running on
clickhouse-serverdocker image.
ClickHouse server
- ClickHouse Server version: 24.5.8
CREATE TABLEstatements for tables involved: See above
Is there any reason you can't just use a MergeTree table? Temporary tables have all kinds of gotchas, especially in a clustered environment, in ClickHouse Cloud, or if your server is behind a load balancer.
Is there any reason you can't just use a MergeTree table? Temporary tables have all kinds of gotchas, especially in a clustered environment, in ClickHouse Cloud, or if your server is behind a load balancer.
We have a multi-cluster node so I would like the uploaded results to be accessible on all nodes, otherwise the specific node must be configured for all clients. These tables are created as an intermediate by elementary.
Besides, even using MergeTree does not solve the actual issue which is a bug. The temporary tables are being created by the elementary package and not explicitly by my dbt project. Currently it is the most compatible observability toolkit for dbt, since dbt_artifacts does not support ClickHouse; the alternative is to create our own package which I would like to avoid.
That sounds like an incompatibility in the elementary package. The data in a temporary (in memory) table on one node will not be available on other nodes in the cluster. So anything you do in a temporary table will be by definition "not accessible on all nodes", and if that package requires temporary tables it's extremely unlikely it will work on a clustered ClickHouse installation.
That sounds like an incompatibility in the elementary package.
I am not contesting this but seems like a red herring in this context. Either clickhouse__create_table_as should not allow the temporary flag altogether or it should parse the settings and pass them correctly (which is the bug described in this issue).
The data in a temporary (in memory) table on one node will not be available on other nodes in the cluster. So anything you do in a temporary table will be by definition "not accessible on all nodes", and if that package requires temporary tables it's extremely unlikely it will work on a clustered ClickHouse installation.
The final tables are created in a clustered table, however the intermediate results are stored in a supposedly temporary table. The flow is that run_results.json are uploaded to a node (any node), it is transformed in that same node (supposedly in a temporary table) and the results are written to a clustered table. Since the artifacts is ok to be localized and so small it does not matter which node runs it, what matters is where the final results end up which in this case happens to be a ReplicatedMergeTree.
It would be really great if this could be resolved since it is seemingly the only issue preventing some (limited) obeservability on our dbt stack using elementary and ClickHouse.
Fair point on the temporary table code. That long predates cluster functionality and is definitely a likely source of future bugs when dbt either runs in cluster mode or hits a different back end ClickHouse node due to load balancing, etc. Nevertheless, you might want to open an issue in the elementary project to allow a mode that doesn't rely on temporary tables (and just cleans up intermediate tables like dbt-clickhouse does for incremental models).
You are correct, it's technically a bug. It might get fixed faster if you submit a PR. :)
Also encountering this issue when using the S3 engine. My guess is it's not officially supported, but I have found the plugin generates ddl that would otherwise create tables using S3 engine were it not for always forcing replicated_deduplication_window = 0.
Possible we could simply just add an opt out for the setting? In our use case, replicated_deduplication_window does not need to be set ever. It's not a relevant setting for our environment.
After the fix merged in https://github.com/ClickHouse/dbt-clickhouse/pull/504 this should be already fixed in v1.9.3
Please reopen if you are still encountering it.