dbt-spark icon indicating copy to clipboard operation
dbt-spark copied to clipboard

[CT-975] [Feature] Parity on `create_table_as` for python and SQL model

Open ChenyuLInx opened this issue 3 years ago • 7 comments

Describe the feature

spark__create_table_as macro support options including partition_by, clustered_by, file_format, location_root, and more options defined in options_clause.

Right now in python models we are just saving everything as delta format with the default setting. We should reach parity for this for python models where possible and raise a clear error when running with options that is not supported.

Motivation:

User would be able to optimize the storage format based on their usage of the table.

Acceptance criteria

Python model would materialize the table with the correct option, and raise error when unsupported option is being specified.

Tests for the PR

You should add integration tests to run the table materialization with supported options, then check that the table has intended property, for example SHOW PARTITION table(link) can be used to check partitions. You should also add tests to to make sure we raised the error on unsupported options.

ChenyuLInx avatar Aug 02 '22 00:08 ChenyuLInx

Hi! After setting up my first python-model in dbt I've found out that partitioning is not supported in dbt python-models and found this issue. Are there any updations on this feature?

BulyginMaksim avatar Dec 23 '22 09:12 BulyginMaksim

Try to update https://github.com/xg1990/dbt-spark/blob/feature/partition-for-py-model/dbt/include/spark/macros/materializations/table.sql however got the following error with dbt-core-1.4.5 dbt-spark-1.4.1:

07:12:24    '_MISSING_TYPE' object is not callable
07:12:24    
07:12:24    > in macro py_script_postfix (macros/python_model/python.sql)
07:12:24    > called by model py_part (models/raw/aks_logs/py_part.py)
07:12:24

xg1990 avatar Mar 23 '23 07:03 xg1990

I created my first python model with dbt and its look like the config for location_root and partition by do nothing. I found this issue, is there any updates or progress about this?

talperetz1 avatar Mar 29 '23 13:03 talperetz1

Do you happen to have any updates on the location_root and partition_by issues?

AlbertoRguezConesa avatar Aug 01 '23 06:08 AlbertoRguezConesa

same here! +1 to this as it seems that at least spark session adapter would have the partitionBy method https://sparkbyexamples.com/pyspark/pyspark-partitionby-example/

srggrs avatar Sep 15 '23 06:09 srggrs

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] avatar Mar 14 '24 01:03 github-actions[bot]

Any movement on this?