kedro-plugins
kedro-plugins copied to clipboard
String interpolation in `ManagedTableDataSet` leads to error on Databricks
Description & context
The upsert method in ManagedTableDataSet is using string-interpolation to pass the table name in the SQL request, see here. The string interpolation is not using regular Python interpolation (using f-string for example) but an internal mechanism of pyspark using a variable set in the configuration.
For an unknown-to-me reason, this leads to a weird bug when used on Databricks where the interpolation is incorrect when the table name contains hourl (sic). Below is the result of a minimal example illustrating the issue
I don't know why this interpolation mechanism was used. If you think that we could replace it by f-string interpolation, I can make a PR in that sense.
Thanks in advance!
Steps to Reproduce
On Databricks, try the following
full_table_location = "`my_catalog`.`my_schema`.`my_table_hourl`"
spark.conf.set("fullTableName", full_table_location)
spark.sql("SELECT * FROM ${fullTableName} LIMIT 1").display()
Your Environment
Include as many relevant details about the environment in which you experienced the bug:
- Kedro version used: 0.18.12
- Kedro datasets: 1.5.3
- Python version: 3.10
- Operating system and version: databricks
Thanks @PetitLepton , I'm tagging this as a bug but might take us some time to triage it, please bear with us in the meantime.
Is this a valid way to do interpolation? It works in config but the example here is pure python code, could you use f-string instead?