dbt-spark
dbt-spark copied to clipboard
[CT-975] [Feature] Parity on `create_table_as` for python and SQL model
Describe the feature
spark__create_table_as macro support options including partition_by, clustered_by, file_format, location_root, and more options defined in options_clause.
Right now in python models we are just saving everything as delta format with the default setting. We should reach parity for this for python models where possible and raise a clear error when running with options that is not supported.
Motivation:
User would be able to optimize the storage format based on their usage of the table.
Acceptance criteria
Python model would materialize the table with the correct option, and raise error when unsupported option is being specified.
Tests for the PR
You should add integration tests to run the table materialization with supported options, then check that the table has intended property, for example SHOW PARTITION table(link) can be used to check partitions. You should also add tests to to make sure we raised the error on unsupported options.
Hi! After setting up my first python-model in dbt I've found out that partitioning is not supported in dbt python-models and found this issue. Are there any updations on this feature?
Try to update https://github.com/xg1990/dbt-spark/blob/feature/partition-for-py-model/dbt/include/spark/macros/materializations/table.sql however got the following error with dbt-core-1.4.5 dbt-spark-1.4.1:
07:12:24 '_MISSING_TYPE' object is not callable
07:12:24
07:12:24 > in macro py_script_postfix (macros/python_model/python.sql)
07:12:24 > called by model py_part (models/raw/aks_logs/py_part.py)
07:12:24
I created my first python model with dbt and its look like the config for location_root and partition by do nothing. I found this issue, is there any updates or progress about this?
Do you happen to have any updates on the location_root and partition_by issues?
same here! +1 to this as it seems that at least spark session adapter would have the partitionBy method https://sparkbyexamples.com/pyspark/pyspark-partitionby-example/
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Any movement on this?