Table with Null engine not created on cluster
Describe the bug
Tables with null engines are not created using the ON CLUSTER clause
Steps to reproduce
- Attempt to create a relation with a null engine e.g.
{{
config(
engine="Null",
materialized="materialized_view",
schema="default",
cluster="default"
)
}}
select * from {{ ref("stg_testing") }}
- Materialized view + target table is only created on a single server, and not across the entire cluster
Expected behaviour
There should be an option to allow a null engine table to be created on cluster
Code examples, such as models or profile settings
Also tried defining the cluster via profiles.yml, but this makes no difference
I figure the quick fix for this is to update the logic in get_on_cluster():
@classmethod
def get_on_cluster(
cls: Type[Self], cluster: str = '', materialized: str = '', engine: str = ''
) -> bool:
if cluster.strip():
return (
materialized in ('view', 'dictionary')
or 'distributed' in materialized
or 'Replicated' in engine
+ or engine in ('Null')
)
else:
return False
Another relevant issue: https://github.com/ClickHouse/dbt-clickhouse/issues/237
The current on_cluster macro is really overloaded. It is a conditional macro that checks for things other than cluster.
For example, I wanted to use it in a post-hook for configuring Row Policies but basically couldn't.
Perhaps instead of adding more and more cases to the if statement, we can refactor it to work by default when called, and fix the problems from its misuse in other macros?
Or we should leave the current one as is since it is used everywhere, but make a real_on_cluster_macro that does not do any checking of the relation, just returns the cluster correctly formatted as 'ON CLUSTER <cluster>' if cluster exists... :D . The current one can use the real_on_cluster_macro under the hood. It should really be called conditional_on_cluster_macro since it conditionally returns on_cluster if some conditions not specified in the README are met.
This is a sincere suggestion, what do you think @benjamin-awd? Then you could use it with any type of Engine etc. and the conditional one can be used by those implementations that need more logic such as distributed/Replicated materializations..
@emirkmo I ended up creating a force_on_cluster config argument that should add the ON CLUSTER clause regardless of engine or materialization type in https://github.com/ClickHouse/dbt-clickhouse/pull/328
Perhaps instead of adding more and more cases to the if statement, we can refactor it to work by default when called, and fix the problems from its misuse in other macros?
This is a good point -- I prefer that option (something like this) that adds the ON CLUSTER clause to everything by default if the cluster argument is defined but I'm not sure what consequences that would have.
As a user, when I specify the cluster argument in the config, I'd expect my relations to be created on the cluster. If a user wants to only create a model on a single connected shard, then they should remove the cluster argument from their model (or profiles).