Resolve Change Relation Type error between Distributed and non-distributed Materializations
Summary
resolve #205
Changing relation type from non-distributed materialization to distributed materialization or otherwise has confusing error messages and full-refresh is not helpful.
Changes
- Added a macro
validate_relation_existenceto:- raise an error when existing relation's on cluster status are not as expected (e.g. relation first created using table materialization then changed to distributed_table)
- drop existing relation when a
full-refreshflag is provided - drop existing relation when relation type is changed (view to table or otherwise) since view materialization is defaulted to be created on cluster
- Added a test in
test_changing_relation_type.py - Debug macro
clickhouse__list_relations_without_caching, if the cluster has only one node, set is_on_cluster to true.
Checklist
Delete items not relevant to your PR:
- [X] Unit and integration tests covering the common scenarios were added
- [X] A human-readable description of the changes was provided to include in CHANGELOG
Thanks for the investigation here @gfunc. I think we should limit the effect to distributed models only, since I'm uncomfortable adding a lot of code (and extra SQL queries) to validate the the table exists in the non-distributed case. In theory the state should be totally controlled by dbt. If we take the validate piece out of the main table materialization does a full refresh still create a "clean" state for the non-distributed case?
Hi @genzgd, Thanks for your comment.
I think the answer is no. To my understanding, the table materialization (not incremental) now is not affected by the full-refresh flag much except for grants. I will validate that shortly.
Also maybe some extra SQLs to handle unexpected existing relations are reasonable for a dbt controlled env? I added only one extra SQL to handle full-refresh, existing relations are loaded from the cache, not affecting performance.
Hi @gfunc,
Can we as a start, mitigate the changes only for distributed cases as @genzgd suggested? (and add a feature flag /config for the validation part?)
Hi @BentsiLeviav,
It could be managed. I could drop those changes in table and incremental materialization.
But IMO the main problem is that we should make full-refresh effective in situations where relation materialization is changed from table to distributed_table and vice-versa.
But the handling of the full-refresh flag is located in materializations, Do you have any suggestions?