dbt-databricks icon indicating copy to clipboard operation
dbt-databricks copied to clipboard

materialization_v2: Table with struct type column can not be build

Open eckesru opened this issue 6 months ago • 6 comments

Describe the bug

When I set the flag for the materialization_v2 to true in my project, one of my models does not build anymore. It is to note, that it has a struct-column with nested fields/arrays and is generally pretty huge.

Steps To Reproduce

Probably create a model with a struct column which is huge, with different data types and nested.

Expected behavior

The model should build successfully

Screenshots and log output

14:15:26 Finished running 2 incremental models, 33 seeds, 74 table models, 61 view models in 0 hours 5 minutes and 4.35 seconds (304.35s). 14:15:28
14:15:28 Completed with 1 error, 0 partial successes, and 0 warnings: 14:15:28
14:15:28 Failure in model offer_output (models/offer/offer_output.sql) 14:15:28 Database Error in model offer_output(models/offer/offer_output.sql)

[PARSE_SYNTAX_ERROR] Syntax error at or near '.'. SQLSTATE: 42601 (line 18, pos 769)

[PARSE_SYNTAX_ERROR] Syntax error at or near '.'. SQLSTATE: 42601 (line 18, pos 769)

(I won't output the complete sql because it is internal, but the part which is marked as error) payload struct<created_at:date,[imagine here inbetween a big subset of columns],subvention:double,reference_date:date,... 9 more fields>, -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------^^^

request string, ....

System information

The output of dbt --version: dbt-databricks==1.10.3 dbt-core==1.10.1

The operating system you're using: It's running inside a azure databricks notebook; Custer Version 16.4 LTS

Additional context

Add any other context about the problem here.

eckesru avatar Jun 19 '25 14:06 eckesru

Almost certainly its because Databricks is truncating the type. Can you try specifying your column data_types in yml, and for the column in question, set the type as variant?

benc-db avatar Jun 20 '25 17:06 benc-db

@benc-db After defining the column schema in yml and set it to variant, it asked me to run first: ALTER TABLE table_name SET TBLPROPERTIES ('delta.feature.variantType-preview' = 'supported');

After I did that:

Database Error in model offer_output [DATATYPE_MISMATCH.CAST_WITHOUT_SUGGESTION] Cannot resolve "payload" due to data type mismatch: cannot cast "STRUCT<reference_date: DATE ....... .to "VARIANT". SQLSTATE: 42K09; line 3 pos 4

Interesting to note is, that the error message actually prints the complete schema without truncating.

I tried to manually define the whole struct schema for the column, and this worked.

Even after dropping the table and running the model, the same error occurs with variant type.

eckesru avatar Jun 20 '25 21:06 eckesru

That's a shame that variant doesn't work for this. Large structs are very difficult for us since most Databricks APIs end up truncating the type information.

benc-db avatar Jun 30 '25 18:06 benc-db

@benc-db I am getting this issue with streaming materialization. Only happening with 1.10 though. Works fine with 1.9 and earlier. From the query history I see it is trying to put the schema in the create table as statement while older dbt versions are not doing that

krom-ca avatar Jul 01 '25 14:07 krom-ca

@ericj-db we need to come up with a solution for this; the frustrating thing is we might need to special case for struct if we can't get the full type information where we need it.

benc-db avatar Jul 01 '25 16:07 benc-db

We have the same issue, Materialization_v2 is failing with struct columns which has high number of nested fields

HaykManukyanAvetiky avatar Jul 23 '25 15:07 HaykManukyanAvetiky