materialization_v2: Table with struct type column can not be build
Describe the bug
When I set the flag for the materialization_v2 to true in my project, one of my models does not build anymore. It is to note, that it has a struct-column with nested fields/arrays and is generally pretty huge.
Steps To Reproduce
Probably create a model with a struct column which is huge, with different data types and nested.
Expected behavior
The model should build successfully
Screenshots and log output
14:15:26 Finished running 2 incremental models, 33 seeds, 74 table models, 61 view models in 0 hours 5 minutes and 4.35 seconds (304.35s). 14:15:28
14:15:28 Completed with 1 error, 0 partial successes, and 0 warnings: 14:15:28
14:15:28 Failure in model offer_output (models/offer/offer_output.sql) 14:15:28 Database Error in model offer_output(models/offer/offer_output.sql)[PARSE_SYNTAX_ERROR] Syntax error at or near '.'. SQLSTATE: 42601 (line 18, pos 769)
[PARSE_SYNTAX_ERROR] Syntax error at or near '.'. SQLSTATE: 42601 (line 18, pos 769)
(I won't output the complete sql because it is internal, but the part which is marked as error) payload struct<created_at:date,[imagine here inbetween a big subset of columns],subvention:double,reference_date:date,... 9 more fields>, -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------^^^
request string, ....
System information
The output of dbt --version:
dbt-databricks==1.10.3
dbt-core==1.10.1
The operating system you're using: It's running inside a azure databricks notebook; Custer Version 16.4 LTS
Additional context
Add any other context about the problem here.
Almost certainly its because Databricks is truncating the type. Can you try specifying your column data_types in yml, and for the column in question, set the type as variant?
@benc-db After defining the column schema in yml and set it to variant, it asked me to run first: ALTER TABLE table_name SET TBLPROPERTIES ('delta.feature.variantType-preview' = 'supported');
After I did that:
Database Error in model offer_output [DATATYPE_MISMATCH.CAST_WITHOUT_SUGGESTION] Cannot resolve "payload" due to data type mismatch: cannot cast "STRUCT<reference_date: DATE ....... .to "VARIANT". SQLSTATE: 42K09; line 3 pos 4
Interesting to note is, that the error message actually prints the complete schema without truncating.
I tried to manually define the whole struct schema for the column, and this worked.
Even after dropping the table and running the model, the same error occurs with variant type.
That's a shame that variant doesn't work for this. Large structs are very difficult for us since most Databricks APIs end up truncating the type information.
@benc-db I am getting this issue with streaming materialization. Only happening with 1.10 though. Works fine with 1.9 and earlier. From the query history I see it is trying to put the schema in the create table as statement while older dbt versions are not doing that
@ericj-db we need to come up with a solution for this; the frustrating thing is we might need to special case for struct if we can't get the full type information where we need it.
We have the same issue, Materialization_v2 is failing with struct columns which has high number of nested fields