kedro-plugins String interpolation in `ManagedTableDataSet` leads to error on Databricks

String interpolation in `ManagedTableDataSet` leads to error on Databricks

Open PetitLepton opened this issue 2 years ago • 1 comments

Description & context

The upsert method in ManagedTableDataSet is using string-interpolation to pass the table name in the SQL request, see here. The string interpolation is not using regular Python interpolation (using f-string for example) but an internal mechanism of pyspark using a variable set in the configuration.

For an unknown-to-me reason, this leads to a weird bug when used on Databricks where the interpolation is incorrect when the table name contains hourl (sic). Below is the result of a minimal example illustrating the issue

Screenshot 2023-10-05 at 16-14-05 Untitled Notebook 2023-10-05 13 34 59 - Databricks

I don't know why this interpolation mechanism was used. If you think that we could replace it by f-string interpolation, I can make a PR in that sense.

Thanks in advance!

Steps to Reproduce

On Databricks, try the following

full_table_location = "`my_catalog`.`my_schema`.`my_table_hourl`"
spark.conf.set("fullTableName", full_table_location)
spark.sql("SELECT * FROM ${fullTableName} LIMIT 1").display()