dbt-databricks icon indicating copy to clipboard operation
dbt-databricks copied to clipboard

[bug] Composite primary key constraint fails

Open kmarq opened this issue 1 year ago • 2 comments
trafficstars

Describe the bug

Unable to apply a multiple column primary key

Steps To Reproduce

Simple model:

select
'a' as col_1,
'b' as col_2

adding col_1 as the PK works as expected.

models:
  - name: model_a
    config:
      contract:
        enforced: true
    columns:
      - name: col_1
        data_type: string
        constraints:
          - type: not_null
          - type: primary_key
      - name: col_2
        data_type: string
        constraints:
          - type: not_null
models:
  - name: model_a
    config:
      contract:
        enforced: true
    columns:
      - name: col_1
        data_type: string
        constraints:
          - type: not_null
      - name: col_2
        data_type: string
        constraints:
          - type: not_null
    constraints:
      - type: primary_key
        columns: [col_1, col_2]   

However, updating to try and use a composite primary key does not work. In the second scenario it tries to create the primary key

alter table model_a add constraint 0a36b673f037eb84f703cefef2531521 primary key(col_1, col_2);

However, it is not setting the two columns as not_null first and so creation of the PK fails.

: Cannot create the primary key 0a36b673f037eb84f703cefef2531521 because its child column(s) col_1 is nullable. Please change the column nullability and retry.

Expected behavior

The columns are set to NOT NULL before the primary key is applied

Screenshots and log output

`       create or replace table `model_a`
 
    using delta
      as
    select col_1, col_2
    from (
select
'a' as col_1,
'b' as col_2
    ) as model_subq`

alter table `model_a` add constraint 0a36b673f037eb84f703cefef2531521 primary key(col_1, col_2);
      
18:59:38 Databricks adapter: Cursor(session-id=01ef186d-6818-15d1-b37f-1642bf4c4ff9, command-id=Unknown) - Closing cursor
18:59:38 Databricks adapter: Exception while trying to execute query
/* {"app": "dbt", "dbt_version": "1.7.14", "dbt_databricks_version": "1.7.15", "databricks_sql_connector_version": "2.9.6", "profile_name": "user", "target_name": "dev", "node_id": "model.databricks_analytics_dbt.model_a"} */

        alter table `model_a` add constraint 0a36b673f037eb84f703cefef2531521 primary key(col_1, col_2);
      
: Cannot create the primary key `0a36b673f037eb84f703cefef2531521` because its child column(s) `col_1` is nullable. Please change the column nullability and retry.

System information

The output of dbt --version:

18:59:18 Running with dbt=1.7.14
18:59:18 Registered adapter: databricks=1.7.15

DBT Cloud

The output of python --version:

Additional context

Add any other context about the problem here.

kmarq avatar May 22 '24 19:05 kmarq

Do you happen to know if you see the same behavior without using a composite key? It seems from your description that the core issue is that not-null constraints need to be applied first, which I would think would be a problem regardless. Thanks for reporting.

benc-db avatar May 23 '24 16:05 benc-db

The same error occurs if I just have col_1 in the primary key.

So really its the difference of primary_key being added in the column constraints definition, which works correctly and applies not_null then primary_key. Compared to putting it at the model level constraints which is not first applying the not_null property. So it would seem order of operations would need to be apply column level constraints, then model level ones. I'm not sure if that breaks for any other combinations though.

kmarq avatar May 23 '24 17:05 kmarq

Probably stating the obvious, but can confirm this is also present in 1.8.3

frankivo avatar Jul 12 '24 09:07 frankivo

There is actual code to make sure the not_null is applied first: https://github.com/databricks/dbt-databricks/blob/ec63dce4ff287756ce1821d4f6d2f36b756fb8b2/dbt/include/databricks/macros/relations/constraints.sql#L82

I think it sorts by alphabet so that not_null comes before primary_key 🤐

Main problem for this specific bug: https://github.com/databricks/dbt-databricks/blob/ec63dce4ff287756ce1821d4f6d2f36b756fb8b2/dbt/include/databricks/macros/relations/constraints.sql#L22 Table constraints are applied before column constraints.

frankivo avatar Jul 12 '24 11:07 frankivo